A Large Language Model is a type of artificial intelligence trained on humongous datasets of text to understand, generate, and manipulate human language. They are built using transformer architectures and function by predicting the next most likely token (word or part of a word) in a sequence.
The core technical concepts:
- scale: “Large” refers to both the training data (petabytes of text) and the parameters count (billions)
- LLM training pipeline
- context window: the maximum amount of text the model can “hold in mind” at one time during a single conversation
The main capabilities:
- Generation: creating coherent essays, poems, code and emails
- Summarisation: condensing long documents into key bullet points
- Translation: converting text between hundreds of natural and programming languages
- Reasoning: solving logic puzzles or mathematical equations by breaking them down into steps
Critical limitations:
- hallucinations
- static knowledge: LLMs do not “learn” in real time; their knowledge is cut off at the date their training ended (unless connected to some sort of search engine)