AI agents · topic 7 of 16

ReAct Implementation from Scratch

Notebook-style walkthrough: minimal agent class, a structured ReAct protocol prompt, manual tool injection, and automated loops with LiteLLM (PDF 206–231; 206 header shared with design patterns).

ReAct Implementation from Scratch

ReAct Implementation from Scratch Below, we shall implement a ReAct Agent in two ways:

● Manually executing each step for better clarity.

● Without manual intervention to fully automate the Reasoning and Action process. Let's look at the manual process ﬁrst.

#1) ReAct with manual execution

In this section, we’ll implement a lightweight ReAct-style agent from scratch, without using any orchestration framework like CrewAI or LangChain. We'll manually simulate each round of the agent's reasoning, pausing, acting and observing exactly as a ReAct loop is meant to function. By running the logic cell-by-cell, we will gain full visibility and control over the thinking process, allowing us to debug and validate the agent’s behavior at each step. To begin, we load the environment variables (like your LLM API key) and import completion from LiteLLM (also install it ﬁrst–pip install litellm), a lightweight wrapper to query LLMs like OpenAI or local models via Ollama.

Illustration from the AI Agents chapter of the course deck.

Next, we deﬁne a minimal Agent class, which wraps around a conversational LLM and keeps track of its full message history - allowing it to reason step-by-step, access system prompts, remember prior inputs and outputs, and produce multi-turn interactions. Here’s what it looks like:

● system (str): This is the system prompt that sets the personality and behavioral constraints for the agent. If passed, it becomes the very ﬁrst message in the conversation just like in OpenAI Chat APIs.

● self.messages: This list acts as the conversation memory. Every interaction, whether it’s user input or assistant output is appended to this list. This history is crucial for LLMs to behave coherently across multiple turns.

● If system is provided, it's added to the message list using the special "role":

"system" identiﬁer. This ensures that every completion that follows is conditioned on the system instructions. Next, we deﬁne a complete method in this class:

This is the core interface you’ll use to interact with your agent.

● If a message is passed:

○ It gets appended as a "user" message to self.messages.

○ This simulates the human asking a question or giving instructions.

● Then, self.invoke() is called (which we will deﬁne shortly). This method sends the full conversation history to the LLM.

● The model’s reply (stored in result) is then appended to self.messages as an

"assistant" role.

● Finally, the reply is returned to the caller.

This method does three things in one call: 1. Records the user input. 2. Gets the model’s reply. 3. Updates the message history for future turns. Finally, we have the invoke method below:

This method handles the actual API call to your LLM provider - in this case, via LiteLLM, using the "openai/gpt-4o" model.

● completion() is a wrapper around the chat completion API. It receives the entire message history and returns a response.

● We assume completion() returns a structure similar to OpenAI’s format: a list of choices, where each choice has a .message.content ﬁeld.

● We extract and return that content - the assistant's next response.

As a test, we can quickly run a simple interaction below:

At this stage, if we ask it about the previous message, we get the correct output, which shows the assistant has visibility on the previous context:

It correctly remembers and reﬂects! Now that our conversational class is setup, we come to the most interesting part, which is deﬁning a ReAct-style prompt. Before an LLM can behave like an agent, it needs clear instructions - not just on what to answer, but how to go about answering. That’s exactly what this system_prompt does, which is deﬁned below:

This isn’t just a prompt. It’s a behavioral protocol - deﬁning what structure the agent should follow, how it should reason, and when it should stop. Let’s break it down line by line. You run in a loop and do JUST ONE thing in a single iteration: This is the framing sentence. It tells the LLM not to rush toward an answer. Instead, it should proceed step by step, following a deﬁned pattern in a loop - mirroring how a ReAct agent works.

1) "Thought" to describe your thoughts about the input question.

2) "PAUSE" to pause and think about the action to take.

3) "Action" to decide what action to take from the list of actions available to you.

4) "PAUSE" to pause and wait for the result of the action.

5) "Observation" will be the output returned by the action.

Here, we give the LLM a reasoning template. These are the same primitives found in all ReAct-style agents. Let’s break each down:

● Thought: The agent's internal monologue. What is it currently thinking about?

● PAUSE (1): Instead of jumping to action, this forces the model to take a breath - simulating asynchronous steps in a multi-agent environment.

● Action: The agent picks from the list of tools it is given.

● PAUSE (2): Wait again, this time for the actual tool result.

● Observation: This will be injected into the prompt by you (the controller or human), after the tool runs. By splitting this into explicit parts, we avoid hallucinations and ensure the agent works in a controlled loop.

At the end of the loop, you produce an Answer. This tells the agent: once it has all the required information - break the loop and give the ﬁnal answer. No need to keep reasoning indeﬁnitely. The actions available to you are: math: e.g. math: (14 * 5) / 4 Evaluates mathematical expressions using Python syntax. lookup_population: e.g. lookup_population: India Returns the latest known population of the speciﬁed country. This is a mini API reference for the agent. We show:

● The name of each tool.

● How to invoke it.

● What kind of output it produces.

This is critical. Without a clear spec, the LLM might:

● Invent non-existent tools.

● Use incorrect syntax.

● Misinterpret what the tool is supposed to do.

By using clear formatting and examples, we teach the model how to interface with tools in a safe, predictable way. Here's a sample run for your reference: Question: What is double the population of Japan?

Iteration 1: Thought: I need to ﬁnd the population of Japan ﬁrst. Iteration 2: PAUSE ... Iteration 9: Observation: 250000000 Iteration 10: Answer: Double the population of Japan is 250 million. This worked-out example gives the LLM a pattern to follow. Even more importantly, it provides the developer (you) a way to intervene at each step - injecting tool results or validating whether the ﬂow is working correctly. With this sample trace:

● The agent knows how to think.

● The agent knows how to act.

● The agent knows when to stop.

Whenever you have the answer, stop the loop and output it to the user. Now begin solving: These closing lines are essential. Without this explicit stop signal, the LLM might continue indeﬁnitely. You're telling it: "When you have all the puzzle pieces, just say the answer and exit the loop." The power of this system_prompt lies in its structure:

● It models intelligent behavior, not just question answering.

● It imposes strong constraints: think before acting, act within deﬁned bounds, and wait for observations.

● It separates reasoning from execution, mimicking how humans operate.

● It creates a feedback-friendly iteration loop for multi-step problems.

Now that the prompt is deﬁned, we implement the tools.

Finally, we begin a manual ReAct session:

This produces the following output: Iteration 1:

Thought: I need to ﬁnd the population of India ﬁrst. We, as a user, don't have any input to give at this stage so we just invoke the complete() method again:

This produces the following output: Iteration 2: PAUSE Yet again, we, as a user, don't have any input to give at this stage so we just invoke the complete() method again:

This produces the following output: Iteration 3: Action: lookup_population: India Now it wants to act.

We still don't have any input to give at this stage so we just invoke the complete() method again:

This produces the following output: Iteration 4: PAUSE At this stage, it needs to get the tool output in the form of an observation. Here, let's intervene and provide it with the observation:

This produces the following output: Iteration 5: Thought: Now I need to ﬁnd the population of Japan. We let it continue its execution:

This produces the following output: Iteration 6: PAUSE We again let it continue its execution:

We get the following output: Iteration 7: Action: lookup_population: Japan At this stage, it needs to get the tool output in the form of an observation. Here, let's again intervene and provide it with the observation:

This produces the following output: Iteration 8: Thought: I now have the populations of both India and Japan. I need to add them together. We again let it continue its execution:

We get the following output: Iteration 9: Action: math: 1400000000 + 125000000 Now we should expect a pause according to the pattern speciﬁed:

Iteration 10: PAUSE It is again seeking an observation, which is the sum of Japan's population and India's population. To do this, we again manually intervene and provide it with the output:

Finally, in this iteration, we get the following output: Iteration 11: Answer: The sum of the population of India and the population of Japan is 1,525,000,000. Great!! With this process:

● The LLM thought about what steps to take.

● It chose actions to execute.

● We manually injected tool outputs like real-world observations.

● It looped until it had enough information to generate a ﬁnal answer.

This gives us an explicit understanding of how reasoning and actions come together in ReAct-style agents. In the next part, we’ll fully automate this - no manual calls required and build a full controller that simulates this entire loop programmatically.

#2) ReAct without manual execution

Now that we have understood how the above ReAct execution went, we can easily automate that to remove our interventions. In this section, we’ll create a controller function that:

● Sends an initial question to the agent,

● Reads its thoughts and actions step-by-step,

● Automatically runs external tools when asked,

● Feeds back observations to the agent,

● And stops the loop once a ﬁnal answer is found.

This is the entire code that does this:

Let’s break down the full loop. We begin by deﬁning the agent_loop() function: It takes:

● query: the user’s natural language question.

● system_prompt: the same ReAct system prompt we explored earlier

(deﬁning the behavior loop). Next, inside this function, we initialize the Agent and available tools:

● Create a new MyAgent instance, using the structured ReAct prompt.

● Deﬁne the dictionary of callable tools available to the agent. These names must match exactly what the agent uses in its Action: lines. Moving on, we deﬁned some state variables:

current_prompt stores the next message to be sent to the LLM. previous_step helps track the last stage (e.g., Thought, Action) for better control ﬂow. Next, we run the reasoning loop, which continues until the agent produces a ﬁnal answer. The answer is expected to be marked with Answer: based on our prompt design:

Next, we feed the current_prompt into the agent.

The current_prompt could be:

● The initial user query,

● A blank string to let the agent continue reasoning,

● An observation from a tool.

We then print the agent’s output, so we can inspect each iteration. Next, if the agent produces a ﬁnal answer, we break the loop.

In another case, if the response includes a Thought: line, we:

● Record the step type as "Thought".

● Set current_prompt to an empty string to continue to the next stage (a

PAUSE).

●

Next, we catch the ﬁrst PAUSE right after the Thought. Nothing else needs to be done here - we just move to the next step.

If we detect an Action: line, we:

● Note that we're in the action step.

● Use a regex to extract the tool name and its argument.

●

For example, in: Action: lookup_population: India, the regex pulls out:

● lookup_population as the tool.

● India as the argument.

Moving on, we execute the tool and capture the observation:

● If the tool name is valid, we call it like a Python function and capture the result.

● We format the output into Observation: ... so the agent can use it in the next step.

● If the tool doesn't exist, we ask the agent to retry.

This mimics tool execution + response injection. Done! Now we can run this function as follows:

This produces the following output, which is indeed correct:

You now have a fully working ReAct loop without needing any external framework. Of course, In this implementation, we’re using regex matching and hardcoded conditionals to parse the agent’s actions and route them to the correct tools. This approach works well for a tightly controlled setup like this demo. However, it’s brittle:

● If the agent slightly deviates from the expected format (e.g., adds extra whitespace, uses diﬀerent casing, or mislabels an action), the regex could fail to match.

Key takeaways

Explicit Thought / Action / Observation templates keep multi-step tool use debuggable.
Separating reasoning text from tool execution prevents accidental hallucinated tool results.