Research Assistant#
A corpus-grounded research agent that searches a local document collection, takes notes, synthesises a cited answer, and verifies every citation maps back to a real note.
Problem#
LLMs are confident prose machines. Ask one a factual question and it will answer — often correctly, sometimes confidently wrong. For research tasks the "sometimes" is the whole problem: you cannot trust the output unless you can trace every claim back to a source.
This agent enforces that trace. It:
- Retrieves sources from a local corpus (swap for a real browser MCP in production — see the integration section).
- Records every fact it plans to cite as a note, with a pointer to the source URL.
- Writes a final answer that must use
[N]markers, where eachNis a real note ID. - Verifies every citation after the fact, flagging unknown IDs, unsupported claims, and factual sentences that forgot a citation.
Architecture#
question
|
v
plan_sub_questions (deterministic)
|
v
search_web -> fetch_document (per sub-question)
|
v
harvest_notes -> ResearchState
|
v
call_agent (synthesize with [N] citations)
|
v
citation_checker.check
|
v
ResearchResult(answer, notes, citations)
The LLM is only called once — for the final synthesis. Retrieval, note harvesting, and verification are Python. That split keeps the cost predictable, makes the citation guarantee auditable, and means the failure modes are debuggable by reading code.
Working memory#
Notes are the agent's working memory for a single query. We chose notes rather than full-document context because:
- The model only sees the facts it will cite. Less text in the prompt = less opportunity for the model to invent claims from stale context.
- Each note carries a source URL, so citation verification is a lookup, not a reasoning step.
- Notes are cheap to re-use if the agent needs to revise the answer.
Cost estimate#
At default settings (Claude Haiku, 4 sources, ~12 notes, 700 max output):
- Single
call_agentwith ~450 input / 60-100 output tokens. - Haiku 4.5: ~$0.0004 per query.
- Sonnet 4.6: ~$0.005 per query.
- Opus 4.7: ~$0.10 per query.
Retrieval cost is near-zero because it's local regex. With a real web search, expect to add the cost of 1-2 search API calls and 3-5 fetch-document calls per query.
Tools#
| Tool | Purpose |
|---|---|
search_web |
Keyword search over the corpus. |
fetch_document |
Full body + metadata of a corpus URL. |
take_note |
Record a claim + source for later citation. |
list_notes |
Dump the current working-memory notes. |
verify_claim |
Lexical overlap check: is this claim supported? |
In production search_web and fetch_document are the only two you
would replace.
Running#
Mock mode is the default when there's no API key.
SWARM_MOCK=true python -m projects.research_assistant.agent \
"When did Unix get the pipe operator?"
Programmatic:
from projects.research_assistant.agent import research
result = await research("What is ReAct?")
print(result.answer)
for n in result.notes:
print(n.id, n.source_url, n.section)
if not result.citations.all_valid:
print("CITATION ISSUES:", result.citations.as_dict())
Integration: real web browsing in production#
The mock corpus is just stand-ins for search_web and fetch_document.
To wire real browsing, replace those two tools with an MCP client call
or a direct HTTP fetch.
Option A: MCP browser server#
- Run an MCP server that exposes
search_webandfetch_document(Firecrawl, Tavily, Brave Search — all have MCP wrappers). - Connect via
swarm.tools.mcp_client.MCPClient. - Register the MCP tools under the same names (
search_web,fetch_document). Nothing else in the agent changes — the registry lookup is by name.
Option B: direct HTTP#
- Replace
search_webwith a call to your search provider's API. - Replace
fetch_documentwithhttpx.get(url)+ HTML-to-text. - Add a robots.txt check before fetching. Rate limit at 2-3 req/s per domain.
Either way, take_note, list_notes, and verify_claim stay local —
they do not depend on the source being a local file.
Failure modes#
- Hallucinated citations. Model emits
[4]when only three notes exist. Caught by the citation checker'sunknown_note_ids. - Unsupported citation. Model emits a real note ID but the claim's
content tokens don't overlap the cited source. Caught by the overlap
check (
unsupported_claims). - Uncited factual claim. A sentence with a year or proper noun that
doesn't carry a
[N]marker. Caught byuncited_claims. - Incomplete research. The retrieval stage missed a source the answer needed. Rare symptom: final answer is shorter than expected or says "I do not have enough information" when the corpus does cover it. Mitigation: the planner expands the question into multiple sub-queries before searching.
- Contradictory sources. Two corpus docs disagree. The agent is
instructed (system prompt rule 4) to cite both sides rather than
pick one. Test
test_research_contradictory_sources_are_both_retrievedexercises the retrieval side of this.
Tests#
Covers corpus loading, search, fetch, note-taking, claim verification, the citation checker (all four issue types), and end-to-end pipeline behaviour on a handful of demo questions.