Skip to main content

Large Language Model

·163 words·1 min
Dave the human
Author
Dave the human
Homo sapiens in the loop

A Large Language Model is a type of artificial intelligence trained on humongous datasets of text to understand, generate, and manipulate human language. They are built using transformer architectures and function by predicting the next most likely token (word or part of a word) in a sequence.

The core technical concepts:

  • scale: “Large” refers to both the training data (petabytes of text) and the parameters count (billions)
  • LLM training pipeline
  • context window: the maximum amount of text the model can “hold in mind” at one time during a single conversation

The main capabilities:

  • Generation: creating coherent essays, poems, code and emails
  • Summarisation: condensing long documents into key bullet points
  • Translation: converting text between hundreds of natural and programming languages
  • Reasoning: solving logic puzzles or mathematical equations by breaking them down into steps

Critical limitations:

  • hallucinations
  • static knowledge: LLMs do not “learn” in real time; their knowledge is cut off at the date their training ended (unless connected to some sort of search engine)

Comments