During pre-training the Large Language Model learns general patterns, grammar, and facts from the internet/books via self-supervised learning. At this stage, the objective for the LLM is to learn to predict the next word (or token) in these texts.
We can think of this stage as “raw language prediction” that gives the LLM basic capabilities to produce coherent texts.