Appendix D: Glossary#

Terms used throughout the course, alphabetized, with one-line definitions and chapter cross-references.

A2A (Agent-to-Agent protocol) — Google's open protocol for standardised inter-agent communication across frameworks. (Appendix A)

ACI (Agent-Computer Interface) — The layer through which an agent interacts with a computer environment: file system, shell, browser, APIs. The quality of the ACI determines the agent's practical capability. (Chapter 4)

agent — An LLM in a loop that can take actions (call tools, read/write memory) until a task is complete. (Chapter 3a)

agentic RAG — Retrieval-augmented generation in an agentic context: the agent decides when and what to retrieve, rather than retrieval being a fixed pipeline step. (Chapter 7)

anti-distillation — A legal or contractual constraint in some API terms of service prohibiting using outputs to train competing models. (Chapter 2)

append-only log — A persistence pattern where state transitions are appended as records rather than overwriting existing state, enabling recovery by replay. (Chapter 8)

augmented LLM — An LLM extended with retrieval, tools, and memory so it can act on external state. The base unit of the Anthropic "Building Effective Agents" taxonomy. (Chapters 1 to 4)

autoDream (mnemonic: memory consolidation) — A consolidation step inspired by sleep/dream cycles where the agent periodically compresses episodic memory into semantic summaries. (Chapter 4)

cache_control — An Anthropic Messages API field marking a prompt block as a cache breakpoint; content up to that breakpoint is eligible for prompt caching on subsequent calls. (Chapter 2)

capability sandboxing — Restricting the set of actions available to an agent or tool to the minimum required for its task. (Chapter 3b, Chapter 7)

checkpoint — A snapshot of agent state written to durable storage before a state transition, enabling recovery to a known-good state after a crash. (Chapter 8)

compaction — Reducing context size by summarising, truncating, or selecting from conversation history to stay within effective context limits. (Chapter 7)

constitutional AI — An alignment technique (Anthropic) where a model critiques and revises its own outputs against a written set of principles. (Chapter 7)

context cliff — The point at which adding more content to the context window causes a sharp degradation in model performance, due to lost-in-the-middle effects or exceeding effective attention range. (Chapter 7)

context rot — The degradation in agent performance as the context window fills with stale, redundant, or contradictory information. (Chapter 7)

cost-vs-accuracy Pareto — The trade-off curve between inference cost and task accuracy; effective routing finds the Pareto-optimal model assignment per task type. (Chapter 5)

dual-LLM defense — A prompt injection mitigation pattern using a separate "safe" LLM to screen inputs before they reach the primary agent. (Chapter 7)

DSPy — A framework (Stanford) for programming language models using declarative signatures and automated prompt optimisation. (Appendix A)

dynamic boundary — The point in a prompt beyond which content changes between calls; everything before benefits from caching, everything after does not. (Chapter 2)

evaluator-optimizer — An agentic pattern where one agent produces output and a second evaluates it, with the cycle repeating until a quality threshold is met. Formalised by Self-Refine (Madaan et al. 2023). (Chapter 6)

fork-join — A parallelism pattern: fork (spawn N workers simultaneously), then join (collect all results before continuing). (Chapter 6)

GAIA — General AI Assistants benchmark; real-world question answering requiring multi-step reasoning and tool use. (Appendix B)

gen_ai.* OTel conventions — The OpenTelemetry semantic conventions for generative AI systems, defining standard attribute names for LLM spans (gen_ai.request.model, gen_ai.usage.input_tokens, and so on). (Chapter 5)

graph RAG — A RAG variant that represents the knowledge base as a graph and retrieves by traversing relationships, enabling multi-hop reasoning. (Chapter 7)

HITL (Human-in-the-Loop) — A design pattern where certain actions or decisions are paused and routed to a human for approval before execution, providing a safety backstop for high-stakes or irreversible operations. (Chapter 7)

hook bus — An event system that lets safety monitors, loggers, and other observers intercept agent actions at defined lifecycle events. (Chapter 7)

JSON-RPC — A lightweight remote-procedure-call protocol encoded as single-line JSON messages, each carrying a method, params, and id; MCP uses JSON-RPC 2.0 over stdio. (Chapter 3b)

KAIROS (mnemonic: background daemon) — A daemon architecture pattern (named for the Greek concept of opportune time) that runs the agent as a long-lived process with scheduled tasks and crash recovery. (Chapter 8)

KV cache — The key-value cache maintained by a transformer at inference time; reusing it across calls dramatically reduces cost and latency. (Chapter 2)

LLM-as-judge — Using a language model to evaluate another language model's output rather than using human raters or heuristic metrics. (Chapter 5)

LoopState — The dataclass the agent loop carries between iterations, typically messages, iteration, max_iterations, and any accumulated tool-use records. (Chapter 3a)

lost-in-the-middle — The empirical finding that LLMs underweight information placed in the middle of long contexts relative to the beginning and end. (Chapter 7)

MCP (Model Context Protocol) — Anthropic's open protocol for standardising how models connect to external tools and data sources; wire format is JSON-RPC 2.0 over stdio. (Chapter 3b)

MoA (Mixture of Agents) — An ensemble pattern where multiple agents independently generate responses which are then synthesised by an aggregator model. Formalised by Wang et al. (2024). (Chapter 6)

orchestrator — An agent whose primary job is to decompose tasks and delegate to worker agents rather than executing work directly. (Chapter 6)

orchestrator-workers — An agentic pattern where a coordinator agent decomposes a task, assigns subtasks to specialized workers, and synthesizes results. (Chapter 6)

OTel (OpenTelemetry) — A vendor-neutral observability framework for traces, metrics, and logs; used throughout this course for agent instrumentation. (Chapter 5)

pairwise preference — An evaluation protocol where a judge compares two responses head-to-head rather than scoring each independently. (Chapter 5)

parallelization (sectioning) — A workflow pattern where independent subtasks are divided among concurrent workers and their outputs aggregated. (Chapter 6)

parallelization (voting) — A workflow pattern where the same task is sent to multiple agents independently and outputs combined by majority or synthesis. (Chapter 6)

Pareto frontier — The set of options where no single option strictly dominates another on every axis; in agent routing the frontier is drawn over cost and quality, with models that lose on both axes discarded. (Chapter 5)

poka-yoke — (Japanese: "mistake-proofing") A design principle that makes incorrect usage impossible or immediately visible, applied via schema validation, output quarantine, and typed tool contracts. (Chapter 4, Chapter 7)

position bias — The tendency of LLM judges to prefer responses in a specific position (typically the first), independent of quality; mitigated by swapping order and averaging. (Chapter 5)

prompt caching — An API-level feature (Anthropic, OpenAI) that lets a stable prefix of a prompt be cached server-side, avoiding recomputation on repeated calls. (Chapter 2)

prompt chaining — A workflow pattern where the output of one LLM call is the input to the next, decomposing a complex task into a fixed sequence. (Chapter 6)

prompt injection — An attack where malicious content in the environment (web pages, files, tool outputs) attempts to hijack the agent's instructions. (Chapter 7)

ReAct — (Reason + Act) A prompting pattern where the model alternates between a Thought step (reasoning) and an Action step (tool call), iterating until the task is complete. Introduced by Yao et al. (2022). (Chapter 3a)

Responses API — OpenAI's stateful agent primitive that unifies chat, tool use, and state across calls behind a single endpoint, as opposed to the older stateless Chat Completions API. (Chapter 2)

routing — A workflow pattern where a classifier decides which model, agent, or pipeline handles a given input, enabling specialization and cost optimization. (Chapter 7)

Self-Refine — An iterative improvement pattern where a generator produces output, a critic evaluates it with specific feedback, and the generator revises until convergence. Madaan et al. (2023). (Chapter 6)

semantic cache — A cache keyed on semantic similarity rather than exact string match; allows cache hits for paraphrased queries. (Chapter 7)

simplicity (Anthropic principle) — Agents should prefer simpler, more reliable actions and request only the permissions they need. (Chapter 1, Chapter 7)

skill library (Voyager) — A growing collection of reusable, tested agent capabilities stored as executable code, retrieved by the agent to solve new tasks without re-deriving solutions. From Wang et al. (2023). (Chapter 8)

stop_reason — A field on the Anthropic Messages API response indicating why generation stopped: end_turn (done), max_tokens (truncated), tool_use (paused for tool results), or stop_sequence (hit a sentinel). (Chapter 1)

SWE-bench — A benchmark of real GitHub issue resolutions used to evaluate software engineering agents. (Appendix B)

tool_result block — The user-turn content block your loop sends back after executing a tool, containing tool_use_id matching the prior tool_use.id and the tool's output. (Chapter 3a)

tool_use block — A content block in the assistant response where the model requests a tool invocation, with fields id, name, and input. (Chapter 3a)

tool-use — The ability of an LLM to call external functions whose outputs are fed back into the conversation. (Chapter 3a)

transparency (Anthropic principle) — Agents should not deceive users or pursue hidden agendas, even when declining to share information. (Chapter 7)

triage — The routing step that classifies an incoming task and assigns it to the appropriate model, agent, or pipeline. (Chapter 7)

triple-gate — A safety pattern requiring three independent checks (policy, harm, confidence) before an agent takes an irreversible action. (Chapter 7)

worker — An agent that receives a scoped subtask from an orchestrator, executes it, and returns a result. (Chapter 6)

worktree — A Git feature allowing multiple working trees from the same repository; used in parallel agent swarms to give each worker isolated file state. (Chapter 6)