Preface#

In 2005, Noam Nisan and Shimon Schocken published a course called Nand2Tetris. The premise was audacious: starting from a single NAND gate, students would build a complete working computer — logic gates, ALU, CPU, assembler, virtual machine, compiler, and operating system — entirely by hand, layer by layer. The course became legendary not because it produced chip designers, but because it produced people who understood computers all the way down.

We built this course because agentic systems need the same treatment.

The typical path into building AI agents goes like this: install a framework, read the docs, wire together pre-built components, call it done. You get something that works until it doesn't. When it breaks, you don't know why. When you want to extend it, you don't know how. When a new capability emerges, you can't reason about where it fits.

This course takes the opposite approach. We start with a single httpx request. By the end of Chapter 9, you will have built a production agentic swarm from scratch, one primitive at a time: parallel orchestrators, a memory system, an eval harness, safety hooks, a daemon that survives crashes, and a plugin format to ship it all.

Why Build Before You Abstract#

Frameworks are not bad. LangGraph, CrewAI, the Anthropic Agents SDK — serious teams use them in production, and we survey them in Appendix A. But reach for a framework before you understand what it does and you lose the ability to reason about the system you're building. You become a configuration engineer, not a systems engineer.

The Nand2Tetris principle is: build each layer until you understand it well enough that you could hand it to someone else as a black box. Only then is it safe to treat it as one.

The Failure-Motivation Chain#

Most chapters end the same way: something breaks. Not randomly — it breaks in a specific, predictable, instructive way that you will have engineered yourself. That failure is the opening sentence of the next chapter.

After Chapter 1 you have a working LLM call with no retry, no caching, fails under load. Chapter 2 fixes that.

After Chapter 2 you have an abstracted, cached, multi-provider call, but it can't loop or use tools. Chapter 3 fixes that.

By Chapter 9 you have felt every failure. You know why the hook bus exists, why compaction has five strategies, why skills and plugins matter. None of it is arbitrary — you watched each problem emerge and built the fix.

What You Will Have Built#

By the end of the book, you will have:

A production call_llm() with multi-provider support, prompt caching, retry, and cost tracking
A tool-using agent with a sandboxed executor and a live MCP server you wrote yourself
A three-layer memory system with scheduled consolidation
A generator/critic pair that refines its own output
An evaluation harness with LLM-as-judge, position-bias mitigation, and OpenTelemetry traces
A parallel swarm with fork-join orchestration and KV cache inheritance
A cost-aware router and five compaction strategies
A safety layer with a hook bus, constitution rules, human-in-the-loop gates, and prompt-injection defense
A production daemon with crash recovery and a skill library
A Claude Code plugin that bundles your skill, a hook, and an MCP server
A complete run on SWE-bench Lite and GAIA Level 1

The swarm/ directory is the answer key — the fully working production system. Every chapter builds toward it.

Who This Is For#

You should take this course if:

You are a software engineer who has used LLMs but wants to understand how agentic systems actually work
You are a researcher building on top of agent frameworks and want to understand what you're building on
You are a technical founder shipping an AI product and need to understand what you're shipping
You went through Nand2Tetris and want the same feeling about AI systems

You do not need prior agent framework experience. You need Python 3.11+, basic asyncio, and one API key (optional — everything runs offline in mock mode).

A Note on the SOTA Guide#

Alongside this course lives README_SOTA.md, a production reference grounded in the Claude Code source. The course teaches you to build it; the SOTA guide explains why it ships that way. Read both.

— Vamshi Krishna Rangu, April 2026

What This Book Is Not#

Not a framework tutorial. Frameworks change; a book built around a specific API is obsolete before it ships. This book is built around the patterns frameworks implement: the agent loop, the tool protocol, the memory layer, the evaluation harness, the plugin format. Those don't change when LangGraph releases 0.5.

Not an ML or training book. We use models; we do not build them. Understanding KV caching (Chapter 2) and attention in long contexts (Chapter 7) will make you a better consumer of the research, but this is not a transformers book.

Not a research paper. Every claim is grounded in code you can run. Where we reference academic results, we cite the primary source. Appendix E is the bibliography.

Who Should Read This Book#

Software engineers who want to understand agents deeply. You've used LLMs but when something breaks, you don't know why. After Chapter 3 you can write a production tool executor from scratch. After Chapter 5 you can evaluate an agent system without outsourcing judgment to a leaderboard.

ML engineers transitioning to agent systems. You understand model behavior and training dynamics, but concurrency, state management, crash recovery, and cost routing are new territory. This book covers that layer.

Students who want the foundational layer before picking up frameworks. The fastest path to genuine depth: build the layer below the abstraction before you rely on it.

How to Use This Book#

Linear (recommended for first read). Each chapter is designed to be read in order. The failure at the end of each chapter is the opening problem of the next.

Reference (jump to specific patterns). Each chapter stands on its own. The glossary (Appendix D) and TOC are your entry points.

Course format with exercises. Each module in the modules/ directory has graded exercises that extend the code in the corresponding chapter. The exercises/ directory has stubs; swarm/ and modules/NN/solutions/ are the answer keys.

What You'll Need#

Python 3.11+ — exception groups, tomllib, and better asyncio introspection. Setup instructions live in the repository README.md.
An API key (optional) — all exercises run fully offline with SWARM_MOCK=true. An Anthropic API key is all you need for core work; other providers are exercised in Chapter 2.
Git — Chapter 6 uses worktrees for parallel agent isolation.
A terminal and a text editor. No notebook required — production agents run as processes, not notebooks.

Conventions Used in This Book#

Code formatting. Inline code uses monospace. Multi-line blocks show the full file path as a comment on the first line where relevant. Listings over 20 lines are excerpted in the text with a pointer to the full file in swarm/ or modules/.

Sidebars. Four kinds appear throughout:

Under the Hood — what is actually happening at the API or protocol level, one layer below the code
War Story — a real production failure that motivates the pattern
Anti-pattern — a common wrong approach and why it fails
Canonical Source — the primary paper, spec, or source file where this pattern originated

Jargon. The book uses standard terms first and our own mnemonics in parentheses. "Background daemon (we call it KAIROS)" leads with the plain English term; after the first mention, we use the plain form. The mnemonics are memorable hooks, not load-bearing names.

Version convention. Three layers:

The pattern (ReAct, fork-join, plugin) — evergreen
The protocol (MCP 2025-03-26, Anthropic Messages API) — changes slowly, noted where relevant
The model identifier (claude-sonnet-4-6) — changes often, always isolated to config files

"The current model" in an example means whatever is configured in your environment. The pattern is the same regardless.