RAG · topic 8 of 13

Agentic RAG

Agents rewrite queries, choose sources, iterate, and self-check—beyond one-shot retrieve-and-read.

RAG vs Agentic RAG

6) Hybrid RAG

Combines dense vector retrieval with graph-based retrieval in a single pipeline. Useful when the task requires both unstructured text and structured relational data for richer answers.

7) Adaptive RAG

Dynamically decides if a query requires a simple direct retrieval or a multi-step reasoning chain. Breaks complex queries into smaller sub-queries for better coverage and accuracy.

8) Agentic RAG

Uses AI agents with planning, reasoning (ReAct, CoT), and memory to orchestrate retrieval from multiple sources. Best suited for complex workﬂows that require tool use, external APIs, or combining multiple RAG techniques. RAG vs Agentic RAG These are some issues with the traditional RAG system :

Classic RAG limits: single retrieve-and-read pass, weak multi-step reasoning, little strategy adaptation.

These systems retrieve once and generate once. This means if the retrieved context isn't enough, the LLM can not dynamically search for more information. RAG systems may provide relevant context but don't reason through complex queries. If a query requires multiple retrieval steps, traditional RAG falls short. There's little adaptability. The LLM can't modify its strategy based on the problem at hand. Due to this, Agentic RAG is becoming increasingly popular. Let's understand this in more detail. Agentic RAG The workﬂow of agentic RAG is depicted below: Note: The diagram above is one of many blueprints that an agentic RAG system may possess. You can adapt it according to your speciﬁc use case. As shown above, the idea is to introduce agentic behaviors at each stage of RAG. Think of agents as someone who can actively think through a task - planning, adapting, and iterating until they arrive at the best solution, rather than just

Agentic RAG blueprint: rewrite, plan, route sources, iterate, and validate before answering.

following a deﬁned set of instructions. The powerful capabilities of LLMs make this possible. Let's understand this step-by-step:

Steps 1-2) The user inputs the query, and an agent rewrites it (removing spelling

mistakes, simplifying it for embedding, etc.)

Step 3) Another agent decides whether it needs more details to answer the query.

Step 4) If not, the rewritten query is sent to the LLM as a prompt.

Step 5-8) If yes, another agent looks through the relevant sources it has access to

(vector database, tools & APIs, and the internet) and decides which source should be useful. The relevant context is retrieved and sent to the LLM as a prompt.

Step 9) Either of the above two paths produces a response.

Step 10) A ﬁnal agent checks if the answer is relevant to the query and context.

Step 11) If yes, return the response.

Step 12) If not, go back to Step 1. This procedure continues for a few iterations

until the system admits it cannot answer the query. This makes the RAG much more robust since, at every step, agentic behavior ensures that individual outcomes are aligned with the ﬁnal goal. That said, it is also important to note that building RAG systems typically boils down to design preferences/choices. Apart from agentic approaches, another important improvement over traditional RAG comes from better retrieval itself - one popular method being HyDE.

Key takeaways

Traditional RAG can retrieve once; agents can loop until evidence suffices.
Design reflects your toolbox: vector DBs, APIs, web, and verification agents.