vocabulary

The number of tokens that can be handled by a tokenizer.

A larger vocabulary in [[LLM]]s:

increases the model size because the [[embedding]] and output layers must store more token representations
increases the per-token compute cost of producing next-token probabilities
allows more words to be represented as single tokens rather than being split into subword components; this can reduce the sequence length since less tokens are required to represent a sentence

So the tradeoff is between a larger vocabulary with somewhat higher per-token cost and a smaller vocabulary that often produces longer token sequences.