Chapter 4b - Vector Memory¶
Companion to book/appendix/vector_memory.md. Runs top-to-bottom in Google Colab in mock mode with no API key required.
import os
if not os.path.exists("crafting-agentic-swarms"):
!git clone https://github.com/TheAiSingularity/crafting-agentic-swarms.git
%cd crafting-agentic-swarms
!pip install -e ".[dev]" --quiet
!pip install matplotlib pandas --quiet
import os
try:
from google.colab import userdata
os.environ["ANTHROPIC_API_KEY"] = userdata.get("ANTHROPIC_API_KEY")
print("Using real API (key from Colab secrets).")
except (ImportError, Exception):
os.environ.setdefault("SWARM_MOCK", "true")
print("Running in mock mode (no API key needed).")
What you'll build¶
- A 50-entry transcript-like fixture covering varied topics.
- Index it with the tfidf backend and run semantic queries.
- Apply metadata filters, inspect scores as a pandas table.
- Plot the cosine-similarity distribution for a representative query.
- Compare tfidf against sentence-transformers on the same query (if installed).
1. Build a fixture¶
These are the kind of short transcript summaries a memory layer would accumulate across a week of agent work.
FIXTURE = [
# Ops and infra
("ops_01", "unusual AWS bill on Sunday, RDS cluster costs tripled", "ops"),
("ops_02", "redis memory usage spiked after new caching layer deployed", "ops"),
("ops_03", "kubernetes pod restarts climbing in production cluster", "ops"),
("ops_04", "postgres connection pool exhausted during peak hours", "ops"),
("ops_05", "S3 egress charges doubled after new analytics job", "ops"),
("ops_06", "grafana dashboards showing elevated p99 latency", "ops"),
("ops_07", "disk space alert on primary database node", "ops"),
("ops_08", "load balancer health checks failing intermittently", "ops"),
("ops_09", "cron job timeout after dependency upgrade", "ops"),
("ops_10", "CDN cache hit ratio dropped below 80 percent", "ops"),
# Security
("sec_01", "suspicious login attempts from new geographic region", "security"),
("sec_02", "vulnerability scanner flagged outdated openssl in base image", "security"),
("sec_03", "IAM policy review requested for service account access", "security"),
("sec_04", "secrets leaked in git history, rotation in progress", "security"),
("sec_05", "TLS certificate expiring in 14 days, renewal scheduled", "security"),
# Product / features
("feat_01", "checkout flow conversion rate improved after button redesign", "product"),
("feat_02", "search autocomplete prototype shipped to beta users", "product"),
("feat_03", "onboarding carousel removed, replaced with single-step signup", "product"),
("feat_04", "dark mode toggle added to user settings page", "product"),
("feat_05", "bulk export feature requested by enterprise customers", "product"),
# Support / FAQ
("faq_01", "how to reset your password from the account page", "faq"),
("faq_02", "how to cancel your subscription and get a refund", "faq"),
("faq_03", "two-factor authentication setup guide", "faq"),
("faq_04", "exporting your data as CSV or JSON", "faq"),
("faq_05", "invite teammates to your organization", "faq"),
("faq_06", "change your billing email and card on file", "faq"),
("faq_07", "keyboard shortcuts reference for power users", "faq"),
# Engineering
("eng_01", "refactored auth module to use dependency injection", "engineering"),
("eng_02", "added pagination to the public API, cursor based", "engineering"),
("eng_03", "migrated logging to structured JSON with request IDs", "engineering"),
("eng_04", "circuit breaker added around flaky upstream service", "engineering"),
("eng_05", "test suite parallelized, CI runtime cut in half", "engineering"),
("eng_06", "feature flag rollout for new recommendation engine", "engineering"),
("eng_07", "database index added to speed up user lookup queries", "engineering"),
("eng_08", "removed deprecated v1 endpoints from the gateway", "engineering"),
("eng_09", "upgrade python runtime from 3.10 to 3.11 across services", "engineering"),
# Research / notes
("note_01", "paper notes: MemGPT treats LLMs as operating systems", "research"),
("note_02", "reflection improves single-agent accuracy by 20 percent", "research"),
("note_03", "chain of thought works best with examples in the prompt", "research"),
("note_04", "retrieval augmented generation reduces hallucination rates", "research"),
("note_05", "tool use benchmarks show gap between planning and execution", "research"),
# Meeting summaries
("mtg_01", "weekly sync: shipping velocity up, two tickets slipped", "meeting"),
("mtg_02", "design review on the new notifications system, approved", "meeting"),
("mtg_03", "postmortem on last weeks outage, action items assigned", "meeting"),
("mtg_04", "budget planning for next quarter, headcount requested", "meeting"),
("mtg_05", "customer feedback session on the mobile app experience", "meeting"),
# Bugs / incidents
("bug_01", "race condition in payment webhook handler, double charges", "bug"),
("bug_02", "timezone bug on recurring events during DST transition", "bug"),
("bug_03", "memory leak in background worker, OOM every 48 hours", "bug"),
("bug_04", "file upload fails silently for files over 100MB", "bug"),
]
assert len(FIXTURE) == 50, f"fixture is {len(FIXTURE)}, want 50"
print(f"Fixture size: {len(FIXTURE)}")
print(f"Unique doc types: {sorted({t for _, _, t in FIXTURE})}")
2. Index with the tfidf backend¶
from swarm.memory.vector_store import VectorStore
store = VectorStore(backend="tfidf")
for doc_id, text, doc_type in FIXTURE:
await store.add(doc_id, text, metadata={"doc_type": doc_type})
print(f"Store size: {await store.size()}")
print(f"Backend: {store.backend}")
3. Run a query, inspect as a table¶
import pandas as pd
FIXTURE_TEXT = {doc_id: text for doc_id, text, _ in FIXTURE}
query = "database costs exploded"
results = await store.search(query, k=8)
df = pd.DataFrame(
[
{"doc_id": doc_id, "score": round(score, 3), "doc_type": meta["doc_type"], "text": FIXTURE_TEXT[doc_id]}
for doc_id, score, meta in results
]
)
print(f"Query: {query!r}\n")
print(df.to_string(index=False))
The top hit is likely ops_05 (S3 egress charges) or a postgres entry. Notice that ops_01 ("unusual AWS bill") ranks somewhere in the middle even though it is semantically on-topic: tfidf matches on the word "costs" and "database" but misses that "AWS bill" is a paraphrase of "costs". We will see sentence-transformers do better on this in step 6.
4. Metadata filter: only FAQ entries¶
faq_results = await store.search("how do I reset my password", k=5, filter={"doc_type": "faq"})
print("FAQ-only results:")
for doc_id, score, meta in faq_results:
print(f" {doc_id:10s} score={score:.3f} {FIXTURE_TEXT[doc_id]}")
5. Cosine-similarity distribution for a query¶
For one query, score every document and plot the histogram. The tail at the right is the set of docs the agent would actually retrieve. Everything under ~0.1 is noise.
import matplotlib.pyplot as plt
query_for_hist = "database costs exploded"
all_results = await store.search(query_for_hist, k=len(FIXTURE))
scores = [s for _, s, _ in all_results]
fig, ax = plt.subplots(figsize=(8, 4))
ax.hist(scores, bins=20, color="#3b82f6", edgecolor="white")
ax.axvline(0.3, color="#ef4444", linestyle="--", label="noise threshold (0.3)")
ax.set_xlabel("cosine similarity")
ax.set_ylabel("document count")
ax.set_title(f"Score distribution for query: {query_for_hist!r}")
ax.legend()
plt.tight_layout()
plt.show()
above_noise = [s for s in scores if s > 0.3]
print(f"Docs above 0.3: {len(above_noise)} / {len(scores)}")
6. Why semantic beats keyword on paraphrase¶
Pick a query whose surface form does not overlap the target doc. Compare tfidf (which misses) against sentence-transformers (which catches it), when available.
import importlib.util
HAS_ST = importlib.util.find_spec("sentence_transformers") is not None
print(f"sentence-transformers available: {HAS_ST}")
paraphrase_query = "cloud bill went way up"
target = "ops_01" # "unusual AWS bill on Sunday"
tfidf_hits = await store.search(paraphrase_query, k=5)
print(f"[tfidf] top-5 for {paraphrase_query!r}:")
for rank, (doc_id, score, _) in enumerate(tfidf_hits, 1):
marker = " <-- target" if doc_id == target else ""
print(f" {rank}. {doc_id:10s} {score:.3f} {FIXTURE_TEXT[doc_id]}{marker}")
if HAS_ST:
st_store = VectorStore(backend="sentence-transformers")
for doc_id, text, doc_type in FIXTURE:
await st_store.add(doc_id, text, metadata={"doc_type": doc_type})
st_hits = await st_store.search(paraphrase_query, k=5)
print(f"[sentence-transformers] top-5 for {paraphrase_query!r}:")
for rank, (doc_id, score, _) in enumerate(st_hits, 1):
marker = " <-- target" if doc_id == target else ""
print(f" {rank}. {doc_id:10s} {score:.3f} {FIXTURE_TEXT[doc_id]}{marker}")
else:
print("sentence-transformers not installed; skipping dense comparison.")
print("To install: pip install sentence-transformers")
print("The tfidf fallback is what the VectorStore returned above.")
In mock-mode runs without sentence-transformers, only the tfidf column is populated, and the graceful fallback is exactly the point of the _resolve_backend logic: asking for sentence-transformers on a fresh Colab that has not installed it must not crash.
7. Side-by-side score table (if dense backend available)¶
if HAS_ST:
comparison_queries = [
"cloud bill went way up",
"users logging in from weird places",
"making the tests run faster",
]
rows = []
for q in comparison_queries:
tfidf_top = (await store.search(q, k=1))[0]
st_top = (await st_store.search(q, k=1))[0]
rows.append({
"query": q,
"tfidf_doc": tfidf_top[0],
"tfidf_score": round(tfidf_top[1], 3),
"st_doc": st_top[0],
"st_score": round(st_top[1], 3),
})
print(pd.DataFrame(rows).to_string(index=False))
else:
print("sentence-transformers not installed; skipping comparison table.")
Takeaways¶
VectorStorewith the tfidf backend is zero-dep and handles the 10K-doc English case well.- Scores below 0.3 are noise; gate your retrieval with a threshold before feeding results into a prompt.
- Metadata filters cheaply scope the search to one doc type; set
khigh enough that the filter has room to keep a reasonable number. - sentence-transformers wins on paraphrase and synonym matches; swap the backend string, nothing else changes.
- Duplicate
doc_idoverwrites in place, so re-indexing on edit is safe and idempotent.