Skip to content

Crafting Agentic Swarms#

Build a production AI agent swarm from a single httpx request — no frameworks, no magic, no skipped steps.#

By the end of this course you will have written — from scratch — every component that powers the AI agents you use every day: the API call, the tool loop, the memory system, the evaluation harness, the safety layer, and the orchestrator that coordinates dozens of parallel workers.

Start with Chapter 1 → Download the book → Open a Colab notebook →


Download the book#

203 pages · 10 chapters · 15 appendices · MIT-licensed · generated from the same markdown you're reading here.

  • PDF

    Print-ready. 203 pages, Georgia body, JetBrains Mono code, high-resolution diagrams. Best for reading on a laptop or tablet.

    Download PDF (4.3 MB) →

  • EPUB

    Works on Kindle, Apple Books, Kobo, iPhone, Android. Reflowable — respects your font size and line spacing.

    Download EPUB (17 MB) →

  • DOCX

    Microsoft Word. Useful for annotating chapters, dropping excerpts into slides, or importing into a publishing pipeline.

    Download DOCX (204 KB) →

  • AsciiDoc

    Source format. Convert to anything — O'Reilly Atlas, Manning AsciiDoctor, Pandoc, custom toolchains.

    Download AsciiDoc (320 KB) →

Prefer a marketplace? Leanpub (pay-what-you-want), Amazon Kindle/Paperback, and Gumroad links are rolling out — tracked on the Downloads page.


Run the code#

Every major primitive in the book has a matching Colab notebook — 11 in total. They run in your browser, free, on Google's infrastructure. Default mode is SWARM_MOCK=true so the exercises cost $0 until you explicitly plug in an API key.

  • 01 · Token Mechanics

    Tokenize a sentence, watch the attention mask, see why prompt length drives latency and cost.

    Open notebook →

  • 03a · The ReAct Loop

    A 30-line agent loop, live. Observe how a wrong tool call in step 2 corrupts the whole trace.

    Open notebook →

  • 03b · Tools & MCP

    Register a tool, call it, wrap the same server as MCP, swap transports without changing the agent.

    Open notebook →

  • 04 · Memory Visualisation

    The three-layer memory system — working, episodic, semantic — with a matplotlib timeline of what got consolidated and why.

    Open notebook →

  • 05 · Eval & Pareto

    LLM-as-judge with position-bias mitigation, bootstrap confidence intervals, Pareto frontiers over cost vs accuracy.

    Open notebook →

  • 07 · Cost Routing

    A router that picks between Haiku, Sonnet, and Opus per query. Charts show where the learned baseline beats the heuristic.

    Open notebook →

Plus five more — full notebook gallery →


Ship a project this weekend#

Six portfolio projects, each ~500–1,500 lines, each with tests, each forkable. Pick the one closest to what you actually want to build.

  • Customer Support Agent

    Tool-using agent over a support ticket DB. Escalation logic, refund policy guardrails, deterministic eval harness.

    See project →

  • Code Review Bot

    Reviews PRs against a style guide. Worker-pool parallelisation across files. Posts inline GitHub comments.

    See project →

  • Data Analyst Agent

    Natural-language → SQL → chart. Critic checks the SQL before it runs. Sandbox isolates the database.

    See project →

  • Research Assistant

    Multi-source retrieval with citation tracking. LLM-as-judge scores answer grounding.

    See project →

  • Knowledge-Base RAG

    Vector retrieval over documentation. Hybrid search (BM25 + embeddings) with reranker.

    See project →

  • Multi-Agent Debate

    Two agents argue, a judge scores. Shows how adversarial setups improve answer quality on ambiguous questions.

    See project →


Why this course exists#

Most AI agent courses teach you to configure frameworks. This one teaches you to build the thing frameworks are wrapping.

There is a difference. When LangChain updates its API (it does, constantly), framework users update their imports. Engineers who understand the underlying loop debug the actual failure, fix it in ten minutes, and move on. This course is for the second type of developer.

The structure is borrowed from Nand2Tetris — the famous computer-science course where you build a computer from NAND gates up to a running operating system. Same idea, new domain: one HTTP call up to a full production multi-agent swarm.

The one rule: you build every primitive before you're allowed to abstract it away.


Who this is for#

  • Software engineers new to agents

    You've used LLMs but when something breaks, you don't know why. After Chapter 3 you can write a production tool executor from scratch. After Chapter 5 you can evaluate an agent system without outsourcing judgment to a leaderboard.

  • Backend engineers going deep

    You know distributed systems, async, observability. The course maps your existing mental models onto the agent layer. Jump to Chapter 8 for the production daemon + multi-tenant cost governance.

  • ML engineers transitioning

    You understand models. Concurrency, state management, crash recovery, and cost routing are new territory. The eval harness in Chapter 5 will feel native; Chapters 6-8 fill the system-engineering gap.


How the course works (repeat 9 times)#

1. Read the chapter          ← textbook section for this module
2. Study the reference code  ← the complete implementation
3. Fill in the exercises     ← your work, from scratch
4. Run the auto-grader       ← bash scripts/grade_module.sh NN
5. Observe the failure       ← run the deliberate-break demo
6. That failure motivates the next chapter → proceed

Do not skip the failure step. Each chapter ends on a broken system on purpose. That break is the emotional hook that makes the next chapter feel necessary rather than arbitrary.


What you'll ship#

  • A production call_agent() with multi-provider support, prompt caching, retry, streaming, and cost tracking
  • A ReAct agent loop with a sandboxed tool executor and a live MCP server you wrote yourself
  • A three-layer memory system with scheduled consolidation and optional vector-DB backend
  • A generator/critic pair that refines its own output
  • An evaluation harness with LLM-as-judge, position-bias mitigation, statistical-significance testing, OpenTelemetry traces
  • A parallel swarm with fork-join orchestration, KV cache inheritance, and a DAG executor for complex workflows
  • A cost-aware router (heuristic + a learned baseline) and five compaction strategies
  • A safety layer with a 29-event hook bus, Constitutional AI rules, human-in-the-loop gates, and prompt-injection defense
  • A production daemon with crash recovery, a Voyager-style skill library, and a per-tenant cost governor
  • A Claude Code plugin that bundles a skill, a hook, and an MCP server
  • A complete run on SWE-bench Lite and GAIA Level 1
  • 6 portfolio projects you can fork and ship this weekend: customer support, code review, data analyst, research assistant, knowledge-base RAG, multi-agent debate

What you'll need#

  • Python 3.11+ — async/await, tomllib, exception groups
  • An API key (optional) — every exercise runs offline in mock mode with SWARM_MOCK=true. Anthropic key is enough when you're ready for the real thing; other providers exercised in Chapter 2.
  • Git — Chapter 6 uses worktrees for parallel agent isolation
  • A terminal and a text editor. No notebook required — production agents run as processes, not notebooks.

First-time setup is in the repo README.


A note on "gold standard"#

Some people will ask whether this course is the Nand2Tetris of agent engineering.

It's not. Not yet. Nand2Tetris earned that label through 20+ years of use in hundreds of universities, thousands of errata cycles, and peer review by two generations of CS educators. This book is a promising first draft of something that could earn the label through the same path.

What you have here:

  • 203 pages, 10 chapters, 15 appendices
  • 12 modules with 50+ auto-graded exercises and reference solutions
  • 11 Colab notebooks with live visualisations
  • 6 runnable portfolio projects (~80 passing tests across them)
  • A production reference package (swarm/) with 100+ passing tests
  • The full supporting apparatus: instructor guide, debugging playbook, async primer, glossary, bibliography

Read it. Fork a project. Tell us what broke. Every errata issue, every forum question, every pull request is a step toward whatever this book actually becomes.


Get involved#

  • GitHub Discussions — ask questions, share what you built, flag errata
  • File an issue — typos, bugs, suggestions
  • Fork the repo — it's MIT-licensed; use it at your company, teach it at your university, extend it however you want

What's next#

If you're ready to start: Chapter 1 — The Raw Call →

If you've finished the course: What's next →

If you're teaching it: Instructor Guide →

If you want all the formats with a bibtex entry: Downloads →