Skip to main content

encoding

·63 words·1 min
Dave the human
Author
Dave the human
Homo sapiens in the loop

When a tokenizer performs the process of encoding (through its encode method), a natural language text is broken into tokens that are then converted to IDs.

# Even notes can have code
prompt = "Sorry Dave, I can't do that"
input_token_ids_list = tokenizer.encode(prompt)
print(input_token_ids_list)
[19152, 20238, 11, 358, 646, 944, 653, 429]

The way back from IDs to natural language is called [[decoding]].


 vocabulary decoding 

Comments