Knowledge Base RAG#
Hybrid keyword + semantic search over a realistic engineering knowledge base, with answer grounding verification so every claim maps to a real source.
What it does#
Answers "How do I deploy to staging?" against a knowledge base of 15+ markdown runbook docs. Every sentence in the response is traceable to a specific chunk of a specific doc — no hallucinations.
Architecture#
- Ch 4 (State & Collaboration) — chunks and indexes the KB; retrieves at query time
- Ch 3b (Tools & MCP) —
search_kb,fetch_doc,list_kb_topics,check_stalenessas tools - Ch 5 (Evaluation) —
grounding_checker.pyverifies every claim in the answer has supporting evidence in the retrieved chunks - Appendix: Vector Memory — optional sentence-transformers backend for the semantic half of hybrid search
Question ──► search_kb (keyword + semantic reranked) ──► top-K chunks
│
┌────────────────────────────────────┤
▼ ▼
synthesize answer grounding_checker
│ │
└──► verified answer + sources ◄─────┘
Tools#
| Tool | Purpose |
|---|---|
search_kb(query, k=5) |
Ranked chunks by hybrid score |
fetch_doc(doc_id) |
Full markdown of a doc |
list_kb_topics() |
Top-level categories |
check_staleness(doc_id, max_age_days=30) |
Flag out-of-date content |
Knowledge base#
knowledge_base/ contains 15+ realistic markdown docs on a mock SaaS company's engineering operations:
- Onboarding (dev setup, IDE config, repo access)
- Deploy runbooks (staging, production, rollback)
- Incident response (on-call, escalation paths)
- Coding standards (style, review process)
- FAQ docs (billing, features, support)
- Release process, secrets management, feature flags
Enough to test retrieval on a real-sized corpus without licensing issues.
Grounding checker#
grounding_checker.py runs after the answer is generated. For each claim in the answer, it verifies:
- The claim appears (lexically or semantically) in one of the retrieved chunks
- The citation
[1],[2]resolves to a real chunk id - No claim is "unsupported" (i.e., invented by the model with no source)
Returns (grounded: bool, ungrounded_claims: list[str]). The agent refuses to output ungrounded answers in production mode; in mock mode it prints a warning so you can see the failure pattern.
Cost estimate#
~$0.005 per query on Haiku (retrieval is cheap; only synthesis calls the LLM).
Run it#
Expected: cited answer pointing at knowledge_base/deploy_staging.md (or similar), grounding check passes, response includes numbered source citations.
Expected: 10+ tests passing (exact-match, paraphrase, out-of-scope, stale-doc warning, multi-doc synthesis, grounding pass/fail detection).
Extending for production#
- Swap the backend: default is hand-rolled TF-IDF + cosine similarity. For >10k docs, move to sentence-transformers + FAISS (see Appendix: Vector Memory). For multilingual or domain-specific, use Anthropic or OpenAI embeddings.
- Add reranking: retrieve 20, rerank with a cross-encoder, keep top 5. Dramatic precision boost.
- Chunk tuning: current default is ~500 tokens per chunk. Domain-specific KBs (legal, medical) often do better with larger chunks and more context.
- Freshness alerts: wire
check_stalenessto Slack so doc owners get pinged when their content ages past the retention window - Auditable: log every query + retrieval + grounding result so you can investigate "why did the agent say X" after the fact