If you can finish someone’s sentence, you already understand the core idea behind ChatGPT and every other large language model.
When a friend says "The early bird catches the…", your brain quietly suggests worm. When they say "Never judge a book by its…", you think cover without trying hard.
That simple act—predicting what comes next—is the foundation of how large language models (LLMs) operate.
They learn by reading enormous amounts of text: books, articles, scientific papers, code, conversations, and instructions. With enough exposure, the model becomes remarkably good at continuing any piece of text in a coherent, meaningful way.
You are not searching a database—you are using context to guess what word likely comes next. That is why people say LLMs “predict the next token.” It sounds technical, but the intuition is everyday language completion.
At the technical level, an LLM processes text in small units called tokens. A token may be a whole word, part of a word, or even punctuation.
The model looks at the tokens so far and predicts the next one. Repeating that process generates full answers, explanations, or code.
This matters because the model’s vocabulary is a fixed list (often tens or hundreds of thousands of tokens). Tokenization keeps rare words representable without millions of unique entries.
When you use an API, token counts on your bill are literally how many of these chunks went in and out.
Everything an LLM does—summarizing a document, generating a function, explaining a concept—emerges from choosing the next token that best fits the patterns it has learned.
Feed tokens in → score the next token → pick one → append → repeat. There is no separate “summarization circuit” inside the model.
Formally: an LLM is a Transformer-based neural network trained on massive text corpora to predict the next token, and through that process acquires the ability to understand, generate, and reason with human language.