Skip to main content

token

·86 words·1 min
Dave the human
Author
Dave the human
Homo sapiens in the loop

A token is not necessary an entire word: it can be defined as a small unit of text that gets processed by a language model.

It can be a full word, part of a word, or even punctuation depending on how the text is split by the tokenizer. E.g. the sentence Sorry Dave, I can't do that can be broken into tokens like Sor, ry , Dave, , I can, 't do, that and then they are converted into numerical IDs that the model can ingest.


Comments