Appendix D: Glossary#
Terms used throughout the course, alphabetized, with one-line definitions and chapter cross-references.
A2A (Agent-to-Agent protocol) — Google's open protocol for standardised inter-agent communication across frameworks. (Appendix A)
ACI (Agent-Computer Interface) — The layer through which an agent interacts with a computer environment: file system, shell, browser, APIs. The quality of the ACI determines the agent's practical capability. (Chapter 4)
agent — An LLM in a loop that can take actions (call tools, read/write memory) until a task is complete. (Chapter 3a)
agentic RAG — Retrieval-augmented generation in an agentic context: the agent decides when and what to retrieve, rather than retrieval being a fixed pipeline step. (Chapter 7)
anti-distillation — A legal or contractual constraint in some API terms of service prohibiting using outputs to train competing models. (Chapter 2)
append-only log — A persistence pattern where state transitions are appended as records rather than overwriting existing state, enabling recovery by replay. (Chapter 8)
augmented LLM — An LLM extended with retrieval, tools, and memory so it can act on external state. The base unit of the Anthropic "Building Effective Agents" taxonomy. (Chapters 1 to 4)
autoDream (mnemonic: memory consolidation) — A consolidation step inspired by sleep/dream cycles where the agent periodically compresses episodic memory into semantic summaries. (Chapter 4)
cache_control — An Anthropic Messages API field marking a prompt block as a cache breakpoint; content up to that breakpoint is eligible for prompt caching on subsequent calls. (Chapter 2)
capability sandboxing — Restricting the set of actions available to an agent or tool to the minimum required for its task. (Chapter 3b, Chapter 7)
checkpoint — A snapshot of agent state written to durable storage before a state transition, enabling recovery to a known-good state after a crash. (Chapter 8)
compaction — Reducing context size by summarising, truncating, or selecting from conversation history to stay within effective context limits. (Chapter 7)
constitutional AI — An alignment technique (Anthropic) where a model critiques and revises its own outputs against a written set of principles. (Chapter 7)
context cliff — The point at which adding more content to the context window causes a sharp degradation in model performance, due to lost-in-the-middle effects or exceeding effective attention range. (Chapter 7)
context rot — The degradation in agent performance as the context window fills with stale, redundant, or contradictory information. (Chapter 7)
cost-vs-accuracy Pareto — The trade-off curve between inference cost and task accuracy; effective routing finds the Pareto-optimal model assignment per task type. (Chapter 5)
dual-LLM defense — A prompt injection mitigation pattern using a separate "safe" LLM to screen inputs before they reach the primary agent. (Chapter 7)
DSPy — A framework (Stanford) for programming language models using declarative signatures and automated prompt optimisation. (Appendix A)
dynamic boundary — The point in a prompt beyond which content changes between calls; everything before benefits from caching, everything after does not. (Chapter 2)
evaluator-optimizer — An agentic pattern where one agent produces output and a second evaluates it, with the cycle repeating until a quality threshold is met. Formalised by Self-Refine (Madaan et al. 2023). (Chapter 6)
fork-join — A parallelism pattern: fork (spawn N workers simultaneously), then join (collect all results before continuing). (Chapter 6)
GAIA — General AI Assistants benchmark; real-world question answering requiring multi-step reasoning and tool use. (Appendix B)
gen_ai.* OTel conventions — The OpenTelemetry semantic conventions for generative AI systems, defining standard attribute names for LLM spans (gen_ai.request.model, gen_ai.usage.input_tokens, and so on). (Chapter 5)
graph RAG — A RAG variant that represents the knowledge base as a graph and retrieves by traversing relationships, enabling multi-hop reasoning. (Chapter 7)
HITL (Human-in-the-Loop) — A design pattern where certain actions or decisions are paused and routed to a human for approval before execution, providing a safety backstop for high-stakes or irreversible operations. (Chapter 7)
hook bus — An event system that lets safety monitors, loggers, and other observers intercept agent actions at defined lifecycle events. (Chapter 7)
JSON-RPC — A lightweight remote-procedure-call protocol encoded as single-line JSON messages, each carrying a method, params, and id; MCP uses JSON-RPC 2.0 over stdio. (Chapter 3b)
KAIROS (mnemonic: background daemon) — A daemon architecture pattern (named for the Greek concept of opportune time) that runs the agent as a long-lived process with scheduled tasks and crash recovery. (Chapter 8)
KV cache — The key-value cache maintained by a transformer at inference time; reusing it across calls dramatically reduces cost and latency. (Chapter 2)
LLM-as-judge — Using a language model to evaluate another language model's output rather than using human raters or heuristic metrics. (Chapter 5)
LoopState — The dataclass the agent loop carries between iterations, typically messages, iteration, max_iterations, and any accumulated tool-use records. (Chapter 3a)
lost-in-the-middle — The empirical finding that LLMs underweight information placed in the middle of long contexts relative to the beginning and end. (Chapter 7)
MCP (Model Context Protocol) — Anthropic's open protocol for standardising how models connect to external tools and data sources; wire format is JSON-RPC 2.0 over stdio. (Chapter 3b)
MoA (Mixture of Agents) — An ensemble pattern where multiple agents independently generate responses which are then synthesised by an aggregator model. Formalised by Wang et al. (2024). (Chapter 6)
orchestrator — An agent whose primary job is to decompose tasks and delegate to worker agents rather than executing work directly. (Chapter 6)
orchestrator-workers — An agentic pattern where a coordinator agent decomposes a task, assigns subtasks to specialized workers, and synthesizes results. (Chapter 6)
OTel (OpenTelemetry) — A vendor-neutral observability framework for traces, metrics, and logs; used throughout this course for agent instrumentation. (Chapter 5)
pairwise preference — An evaluation protocol where a judge compares two responses head-to-head rather than scoring each independently. (Chapter 5)
parallelization (sectioning) — A workflow pattern where independent subtasks are divided among concurrent workers and their outputs aggregated. (Chapter 6)
parallelization (voting) — A workflow pattern where the same task is sent to multiple agents independently and outputs combined by majority or synthesis. (Chapter 6)
Pareto frontier — The set of options where no single option strictly dominates another on every axis; in agent routing the frontier is drawn over cost and quality, with models that lose on both axes discarded. (Chapter 5)
poka-yoke — (Japanese: "mistake-proofing") A design principle that makes incorrect usage impossible or immediately visible, applied via schema validation, output quarantine, and typed tool contracts. (Chapter 4, Chapter 7)
position bias — The tendency of LLM judges to prefer responses in a specific position (typically the first), independent of quality; mitigated by swapping order and averaging. (Chapter 5)
prompt caching — An API-level feature (Anthropic, OpenAI) that lets a stable prefix of a prompt be cached server-side, avoiding recomputation on repeated calls. (Chapter 2)
prompt chaining — A workflow pattern where the output of one LLM call is the input to the next, decomposing a complex task into a fixed sequence. (Chapter 6)
prompt injection — An attack where malicious content in the environment (web pages, files, tool outputs) attempts to hijack the agent's instructions. (Chapter 7)
ReAct — (Reason + Act) A prompting pattern where the model alternates between a Thought step (reasoning) and an Action step (tool call), iterating until the task is complete. Introduced by Yao et al. (2022). (Chapter 3a)
Responses API — OpenAI's stateful agent primitive that unifies chat, tool use, and state across calls behind a single endpoint, as opposed to the older stateless Chat Completions API. (Chapter 2)
routing — A workflow pattern where a classifier decides which model, agent, or pipeline handles a given input, enabling specialization and cost optimization. (Chapter 7)
Self-Refine — An iterative improvement pattern where a generator produces output, a critic evaluates it with specific feedback, and the generator revises until convergence. Madaan et al. (2023). (Chapter 6)
semantic cache — A cache keyed on semantic similarity rather than exact string match; allows cache hits for paraphrased queries. (Chapter 7)
simplicity (Anthropic principle) — Agents should prefer simpler, more reliable actions and request only the permissions they need. (Chapter 1, Chapter 7)
skill library (Voyager) — A growing collection of reusable, tested agent capabilities stored as executable code, retrieved by the agent to solve new tasks without re-deriving solutions. From Wang et al. (2023). (Chapter 8)
stop_reason — A field on the Anthropic Messages API response indicating why generation stopped: end_turn (done), max_tokens (truncated), tool_use (paused for tool results), or stop_sequence (hit a sentinel). (Chapter 1)
SWE-bench — A benchmark of real GitHub issue resolutions used to evaluate software engineering agents. (Appendix B)
tool_result block — The user-turn content block your loop sends back after executing a tool, containing tool_use_id matching the prior tool_use.id and the tool's output. (Chapter 3a)
tool_use block — A content block in the assistant response where the model requests a tool invocation, with fields id, name, and input. (Chapter 3a)
tool-use — The ability of an LLM to call external functions whose outputs are fed back into the conversation. (Chapter 3a)
transparency (Anthropic principle) — Agents should not deceive users or pursue hidden agendas, even when declining to share information. (Chapter 7)
triage — The routing step that classifies an incoming task and assigns it to the appropriate model, agent, or pipeline. (Chapter 7)
triple-gate — A safety pattern requiring three independent checks (policy, harm, confidence) before an agent takes an irreversible action. (Chapter 7)
worker — An agent that receives a scoped subtask from an orchestrator, executes it, and returns a result. (Chapter 6)
worktree — A Git feature allowing multiple working trees from the same repository; used in parallel agent swarms to give each worker isolated file state. (Chapter 6)