tokenizer

A tokenizer is a critical component of the LLM text processing and generation pipeline even though it is not directly part of it. It splits the text into tokens that get converted into numerical IDs to be ingested by the language model (encoding) and it decodes back the LLM’s output to human-readable text (decoding).