sharpbyte.dev
← RAG
RAG · topic 1 of 13

What is RAG?

Retrieval, augmentation, and generation—grounding LLMs without retraining on every update.

Retrieval-Augmented Generation

RAG

What is RAG? Up to this point, we have seen two ways to adapt an LLM to a task:

● Prompt Engineering - which steers the model at inference time

● Fine-tuning - which adjusts its internal parameters.

Both approaches are powerful, but they share one fundamental limitation: the model can only use the knowledge it already contains. LLMs do not automatically know new information, private data, company documents, or anything that appeared after their training cutoff. Retraining them repeatedly to stay updated is impractical and expensive. This is where Retrieval-Augmented Generation (RAG) comes in. Let’s break it down:

● Retrieval: Accessing and retrieving information from a knowledge source, such as a database or memory.

● Augmented: Enhancing or enriching something, in this case, the text generation process, with additional information or context.

● Generation: The process of creating or producing something, in this context, generating text or language.

Retrieval-Augmented Generation: retrieve trusted context, then generate—beyond prompt-only or full retraining.
Retrieval-Augmented Generation: retrieve trusted context, then generate—beyond prompt-only or full retraining.

Key takeaways

  • RAG adds external knowledge at inference time via retrieval, not by stuffing the pretraining corpus into weights.
  • Vector databases hold embeddings so the model can fetch relevant context on demand.