Course Guide: Adapting Crafting Agentic Swarms for Cohorts#
Practical notes for instructors, bootcamp designers, and teaching assistants running this course as a paid program. The repository itself contains the teaching material; this guide covers the parts that are not code.
Who this guide is for#
Three audiences. Instructors running a synchronous bootcamp, who need a pacing plan and an assessment rubric. University TAs teaching this as a semester or quarter course, who need the week-by-week breakdown and a homework/quiz split. Corporate training leads running it as a 2-week intensive, who need to know what can be compressed and what cannot.
You do not need to be an active researcher to teach it. You need to have built the entire swarm yourself once, end to end, with live API keys. The Nand2Tetris pedagogy the course uses is load-bearing: a student who sees you debug their code live learns more than one who watches a polished lecture. Plan for 40 hours of prep before day one of your cohort.
Syllabus templates#
Three shapes. Pick the one that matches your cohort's available time.
Semester format (14 weeks)#
Pace: one module per week, Weeks 1-12. Two lectures plus one lab each week, roughly 4 hours synchronous plus 6 hours homework per module. Week 13-14 is the capstone from projects/, which is real enough that students ship something to an audience.
| Week | Module | Deliverable | Weight |
|---|---|---|---|
| 1 | M01 raw_call | httpx call + token counter | 5% |
| 2 | M02 providers | Multi-provider chokepoint | 5% |
| 3 | M03 agent_loop | ReAct calculator | 5% |
| 4 | M04 tools_sandbox | MCP filesystem demo | 5% |
| 5 | M05 memory | 3-layer memory + autoDream | 5% |
| 6 | M06 two_agents | Generator + critic loop | 5% |
| 7 | Midterm exam | Code-reading + short-build | 10% |
| 8 | M07 eval_observability | Judge + Pareto chart | 5% |
| 9 | M08 orchestrator_workers | Fork-join swarm | 5% |
| 10 | M09 routing_compaction | TriageRouter + compaction | 5% |
| 11 | M10 safety_hooks | HookBus + constitutional rules | 5% |
| 12 | M11 production_daemon | KAIROS daemon + skill library | 5% |
| 13 | Capstone (project) | Working deliverable | 15% |
| 14 | Capstone (demo) | Final presentation | 15% |
6-week bootcamp#
Pace: two modules per week, 6-8 hours synchronous per week, 10 hours homework. Exhausting for students; compensate with generous office-hour access.
| Week | Modules | Focus |
|---|---|---|
| 1 | M01-M02 | API fundamentals, provider chokepoint |
| 2 | M03-M04 | Agent loop, tools, safety boundaries |
| 3 | M05-M06 | Memory, two-agent patterns |
| 4 | M07-M08 | Evaluation, fork-join orchestration |
| 5 | M09-M10 | Routing, production safety hooks |
| 6 | M11 + capstone | Daemon, skill library, final demo |
Skip M12 at this pace unless the cohort is full-time; it takes longer than one week to do honestly.
2-week intensive#
Pace: all 10 chapters in 10 working days, 8 hours per day. Viable only for full-time working engineers. Days 1-10 are one chapter each; expect students to work evenings on exercises. No capstone in this format; replace with a Day-10 group demo of a single swarm feature.
Assessment rubrics#
Pick weights that match your institutional requirements. The defaults below are calibrated for a bootcamp where 70% is a passing grade.
Exercise grading (60%)#
Run scripts/grade_module.sh NN for each module. The pytest auto-grader returns pass/fail per exercise. Scale to a 0-100 score: (passed_tests / total_tests) * 100. Cap module scores at 100% but allow partial credit (a student with 7/10 passing tests earns 70% for that module). Do not hand-grade exercises unless a student has a legitimate reason the auto-grader failed.
Capstone project (30%)#
A working deliverable plus a demo. Rubric out of 30:
- Functionality (10): Does it do what the spec says? Does it handle the 3 edge cases listed in the project README?
- Code quality (10): Passes ruff and the repo's import conventions. No dead code. No TODOs in merged work.
- Documentation (5): README that a new contributor could follow. Inline docstrings on public entrypoints.
- Tests (5): At least 3 tests per new public function. Uses SWARM_MOCK=true for reproducibility.
Participation (10%)#
Pair programming attendance, code-review contributions, and office-hour engagement. Subjective but important: students who attend every lab and review peers' PRs learn faster than students who disappear between deliverables.
Common student mistakes per module#
M01 raw_call. Forgetting await on async functions. Missing the anthropic-version header on raw httpx calls. Confusing max_tokens (response cap) with total tokens.
M02 providers. Assuming OpenAI's prompt_tokens field in the Anthropic adapter; Anthropic uses input_tokens. Forgetting to prepend static_doc on non-caching providers (bug shows up as "cached content missing" on OpenAI).
M03 agent_loop. Re-initializing LoopState.messages = [] inside the iteration loop instead of outside it. Result: every turn starts fresh. Also: forgetting to append the tool-result message before the next model call.
M04 tools_sandbox. Not stripping \r before regex validation, so rm -rf /\r# comment bypasses the deny list. Catching the \r injection live is the pedagogical moment; do not spoil it by documenting the fix upfront.
M05 memory. Writing to the index JSON without the PID lock, producing corrupted state on concurrent writes. On Windows: not handling the os.kill unavailability with the age-based fallback.
M06 two_agents. Infinite refinement loops when the critic always finds "more to improve." Cap iterations at 3 and force the generator to return if the critic's feedback is substantively identical to the previous round.
M07 eval_observability. Computing avg_score without handling the empty-cases edge case (division by zero). Position-bias detection that misses the case where one ordering ties and the other does not.
M08 orchestrator_workers. Forgetting return_exceptions=True on asyncio.gather; one failing worker cancels the rest. Not using a shared prefix for KV-cache savings across forks.
M09 routing_compaction. Compacting the system prompt instead of just the message history. Summarize strategy that loses the original user request.
M10 safety_hooks. Registering a hook handler and forgetting it is synchronous when the rest of the bus is async. Blocking the whole hook bus on one slow handler.
M11 production_daemon. Writing to the append-only log without fsync, losing data on crash. Stale PID locks that prevent daemon restart.
M12 capstone. Overfitting to the SWE-bench Lite subset; the full set has regressions the Lite subset does not surface. Running evals against a stale baseline because git fetch was forgotten.
Running as a cohort#
Use GitHub Classroom: one repo per student, forked from the main course repo. Students open PRs against their own fork; TAs review and merge. For shared memory directories: every student gets their own memory/ folder, never shared. Budget $20 in API credits per student for the full course if they run with real models; $0 if they stick to SWARM_MOCK=true. Cap individual runs at max_cost=$1.00 via CostGovernor to prevent runaway loops from burning a student's budget in one evening.