sharpbyte.dev
← LLMs
LLMs · topic 1 of 11

What is an LLM?

If you can finish someone’s sentence, you already understand the core idea behind ChatGPT and every other large language model.

Start with something you already do

When a friend says "The early bird catches the…", your brain quietly suggests worm. When they say "Never judge a book by its…", you think cover without trying hard.

That simple act—predicting what comes next—is the foundation of how large language models (LLMs) operate.

They learn by reading enormous amounts of text: books, articles, scientific papers, code, conversations, and instructions. With enough exposure, the model becomes remarkably good at continuing any piece of text in a coherent, meaningful way.

You are not searching a database—you are using context to guess what word likely comes next. That is why people say LLMs “predict the next token.” It sounds technical, but the intuition is everyday language completion.

The model reads your prompt, scores possible next words, and picks one.
The model reads your prompt, scores possible next words, and picks one.
Each new token is added to the input, then the model predicts again—loop until the answer is complete.
Each new token is added to the input, then the model predicts again—loop until the answer is complete.

What is a token?

At the technical level, an LLM processes text in small units called tokens. A token may be a whole word, part of a word, or even punctuation.

The model looks at the tokens so far and predicts the next one. Repeating that process generates full answers, explanations, or code.

This matters because the model’s vocabulary is a fixed list (often tens or hundreds of thousands of tokens). Tokenization keeps rare words representable without millions of unique entries.

When you use an API, token counts on your bill are literally how many of these chunks went in and out.

Text becomes token IDs, flows through the model, and exits as the next predicted token.
Text becomes token IDs, flows through the model, and exits as the next predicted token.

Putting it together

Everything an LLM does—summarizing a document, generating a function, explaining a concept—emerges from choosing the next token that best fits the patterns it has learned.

Feed tokens in → score the next token → pick one → append → repeat. There is no separate “summarization circuit” inside the model.

Formally: an LLM is a Transformer-based neural network trained on massive text corpora to predict the next token, and through that process acquires the ability to understand, generate, and reason with human language.

Three ingredients: Transformer architecture, massive data, next-token training.
Three ingredients: Transformer architecture, massive data, next-token training.

Key takeaways

  • An LLM is a next-token predictor—fancy autocomplete trained on internet-scale text.
  • Tokens are the model’s alphabet; they are not always whole words.
  • Impressive features (chat, code, summaries) all come from the same generation loop.