Large Language Model

A Large Language Model is a type of artificial intelligence trained on humongous datasets of text to understand, generate, and manipulate human language. They are built using transformer architectures and function by predicting the next most likely token (word or part of a word) in a sequence.

The core technical concepts:

scale: “Large” refers to both the training data (petabytes of text) and the parameters count (billions)
LLM training pipeline
context window: the maximum amount of text the model can “hold in mind” at one time during a single conversation

The main capabilities:

Generation: creating coherent essays, poems, code and emails
Summarisation: condensing long documents into key bullet points
Translation: converting text between hundreds of natural and programming languages
Reasoning: solving logic puzzles or mathematical equations by breaking them down into steps

Critical limitations:

hallucinations
static knowledge: LLMs do not “learn” in real time; their knowledge is cut off at the date their training ended (unless connected to some sort of search engine)