Chapter 4b - Vector Memory¶

Companion to book/appendix/vector_memory.md. Runs top-to-bottom in Google Colab in mock mode with no API key required.

In [ ]:

Copied!





import os
if not os.path.exists("crafting-agentic-swarms"):
    !git clone https://github.com/TheAiSingularity/crafting-agentic-swarms.git
%cd crafting-agentic-swarms
!pip install -e ".[dev]" --quiet
!pip install matplotlib pandas --quiet
import os
if not os.path.exists("crafting-agentic-swarms"):
    !git clone https://github.com/TheAiSingularity/crafting-agentic-swarms.git
%cd crafting-agentic-swarms
!pip install -e ".[dev]" --quiet
!pip install matplotlib pandas --quiet

In [ ]:

Copied!





import os
try:
    from google.colab import userdata
    os.environ["ANTHROPIC_API_KEY"] = userdata.get("ANTHROPIC_API_KEY")
    print("Using real API (key from Colab secrets).")
except (ImportError, Exception):
    os.environ.setdefault("SWARM_MOCK", "true")
    print("Running in mock mode (no API key needed).")
import os
try:
    from google.colab import userdata
    os.environ["ANTHROPIC_API_KEY"] = userdata.get("ANTHROPIC_API_KEY")
    print("Using real API (key from Colab secrets).")
except (ImportError, Exception):
    os.environ.setdefault("SWARM_MOCK", "true")
    print("Running in mock mode (no API key needed).")

What you'll build¶

A 50-entry transcript-like fixture covering varied topics.
Index it with the tfidf backend and run semantic queries.
Apply metadata filters, inspect scores as a pandas table.
Plot the cosine-similarity distribution for a representative query.
Compare tfidf against sentence-transformers on the same query (if installed).

1. Build a fixture¶

These are the kind of short transcript summaries a memory layer would accumulate across a week of agent work.

In [ ]:

Copied!





FIXTURE = [
    # Ops and infra
    ("ops_01", "unusual AWS bill on Sunday, RDS cluster costs tripled", "ops"),
    ("ops_02", "redis memory usage spiked after new caching layer deployed", "ops"),
    ("ops_03", "kubernetes pod restarts climbing in production cluster", "ops"),
    ("ops_04", "postgres connection pool exhausted during peak hours", "ops"),
    ("ops_05", "S3 egress charges doubled after new analytics job", "ops"),
    ("ops_06", "grafana dashboards showing elevated p99 latency", "ops"),
    ("ops_07", "disk space alert on primary database node", "ops"),
    ("ops_08", "load balancer health checks failing intermittently", "ops"),
    ("ops_09", "cron job timeout after dependency upgrade", "ops"),
    ("ops_10", "CDN cache hit ratio dropped below 80 percent", "ops"),
    # Security
    ("sec_01", "suspicious login attempts from new geographic region", "security"),
    ("sec_02", "vulnerability scanner flagged outdated openssl in base image", "security"),
    ("sec_03", "IAM policy review requested for service account access", "security"),
    ("sec_04", "secrets leaked in git history, rotation in progress", "security"),
    ("sec_05", "TLS certificate expiring in 14 days, renewal scheduled", "security"),
    # Product / features
    ("feat_01", "checkout flow conversion rate improved after button redesign", "product"),
    ("feat_02", "search autocomplete prototype shipped to beta users", "product"),
    ("feat_03", "onboarding carousel removed, replaced with single-step signup", "product"),
    ("feat_04", "dark mode toggle added to user settings page", "product"),
    ("feat_05", "bulk export feature requested by enterprise customers", "product"),
    # Support / FAQ
    ("faq_01", "how to reset your password from the account page", "faq"),
    ("faq_02", "how to cancel your subscription and get a refund", "faq"),
    ("faq_03", "two-factor authentication setup guide", "faq"),
    ("faq_04", "exporting your data as CSV or JSON", "faq"),
    ("faq_05", "invite teammates to your organization", "faq"),
    ("faq_06", "change your billing email and card on file", "faq"),
    ("faq_07", "keyboard shortcuts reference for power users", "faq"),
    # Engineering
    ("eng_01", "refactored auth module to use dependency injection", "engineering"),
    ("eng_02", "added pagination to the public API, cursor based", "engineering"),
    ("eng_03", "migrated logging to structured JSON with request IDs", "engineering"),
    ("eng_04", "circuit breaker added around flaky upstream service", "engineering"),
    ("eng_05", "test suite parallelized, CI runtime cut in half", "engineering"),
    ("eng_06", "feature flag rollout for new recommendation engine", "engineering"),
    ("eng_07", "database index added to speed up user lookup queries", "engineering"),
    ("eng_08", "removed deprecated v1 endpoints from the gateway", "engineering"),
    ("eng_09", "upgrade python runtime from 3.10 to 3.11 across services", "engineering"),
    # Research / notes
    ("note_01", "paper notes: MemGPT treats LLMs as operating systems", "research"),
    ("note_02", "reflection improves single-agent accuracy by 20 percent", "research"),
    ("note_03", "chain of thought works best with examples in the prompt", "research"),
    ("note_04", "retrieval augmented generation reduces hallucination rates", "research"),
    ("note_05", "tool use benchmarks show gap between planning and execution", "research"),
    # Meeting summaries
    ("mtg_01", "weekly sync: shipping velocity up, two tickets slipped", "meeting"),
    ("mtg_02", "design review on the new notifications system, approved", "meeting"),
    ("mtg_03", "postmortem on last weeks outage, action items assigned", "meeting"),
    ("mtg_04", "budget planning for next quarter, headcount requested", "meeting"),
    ("mtg_05", "customer feedback session on the mobile app experience", "meeting"),
    # Bugs / incidents
    ("bug_01", "race condition in payment webhook handler, double charges", "bug"),
    ("bug_02", "timezone bug on recurring events during DST transition", "bug"),
    ("bug_03", "memory leak in background worker, OOM every 48 hours", "bug"),
    ("bug_04", "file upload fails silently for files over 100MB", "bug"),
]

assert len(FIXTURE) == 50, f"fixture is {len(FIXTURE)}, want 50"
print(f"Fixture size: {len(FIXTURE)}")
print(f"Unique doc types: {sorted({t for _, _, t in FIXTURE})}")
FIXTURE = [
    # Ops and infra
    ("ops_01", "unusual AWS bill on Sunday, RDS cluster costs tripled", "ops"),
    ("ops_02", "redis memory usage spiked after new caching layer deployed", "ops"),
    ("ops_03", "kubernetes pod restarts climbing in production cluster", "ops"),
    ("ops_04", "postgres connection pool exhausted during peak hours", "ops"),
    ("ops_05", "S3 egress charges doubled after new analytics job", "ops"),
    ("ops_06", "grafana dashboards showing elevated p99 latency", "ops"),
    ("ops_07", "disk space alert on primary database node", "ops"),
    ("ops_08", "load balancer health checks failing intermittently", "ops"),
    ("ops_09", "cron job timeout after dependency upgrade", "ops"),
    ("ops_10", "CDN cache hit ratio dropped below 80 percent", "ops"),
    # Security
    ("sec_01", "suspicious login attempts from new geographic region", "security"),
    ("sec_02", "vulnerability scanner flagged outdated openssl in base image", "security"),
    ("sec_03", "IAM policy review requested for service account access", "security"),
    ("sec_04", "secrets leaked in git history, rotation in progress", "security"),
    ("sec_05", "TLS certificate expiring in 14 days, renewal scheduled", "security"),
    # Product / features
    ("feat_01", "checkout flow conversion rate improved after button redesign", "product"),
    ("feat_02", "search autocomplete prototype shipped to beta users", "product"),
    ("feat_03", "onboarding carousel removed, replaced with single-step signup", "product"),
    ("feat_04", "dark mode toggle added to user settings page", "product"),
    ("feat_05", "bulk export feature requested by enterprise customers", "product"),
    # Support / FAQ
    ("faq_01", "how to reset your password from the account page", "faq"),
    ("faq_02", "how to cancel your subscription and get a refund", "faq"),
    ("faq_03", "two-factor authentication setup guide", "faq"),
    ("faq_04", "exporting your data as CSV or JSON", "faq"),
    ("faq_05", "invite teammates to your organization", "faq"),
    ("faq_06", "change your billing email and card on file", "faq"),
    ("faq_07", "keyboard shortcuts reference for power users", "faq"),
    # Engineering
    ("eng_01", "refactored auth module to use dependency injection", "engineering"),
    ("eng_02", "added pagination to the public API, cursor based", "engineering"),
    ("eng_03", "migrated logging to structured JSON with request IDs", "engineering"),
    ("eng_04", "circuit breaker added around flaky upstream service", "engineering"),
    ("eng_05", "test suite parallelized, CI runtime cut in half", "engineering"),
    ("eng_06", "feature flag rollout for new recommendation engine", "engineering"),
    ("eng_07", "database index added to speed up user lookup queries", "engineering"),
    ("eng_08", "removed deprecated v1 endpoints from the gateway", "engineering"),
    ("eng_09", "upgrade python runtime from 3.10 to 3.11 across services", "engineering"),
    # Research / notes
    ("note_01", "paper notes: MemGPT treats LLMs as operating systems", "research"),
    ("note_02", "reflection improves single-agent accuracy by 20 percent", "research"),
    ("note_03", "chain of thought works best with examples in the prompt", "research"),
    ("note_04", "retrieval augmented generation reduces hallucination rates", "research"),
    ("note_05", "tool use benchmarks show gap between planning and execution", "research"),
    # Meeting summaries
    ("mtg_01", "weekly sync: shipping velocity up, two tickets slipped", "meeting"),
    ("mtg_02", "design review on the new notifications system, approved", "meeting"),
    ("mtg_03", "postmortem on last weeks outage, action items assigned", "meeting"),
    ("mtg_04", "budget planning for next quarter, headcount requested", "meeting"),
    ("mtg_05", "customer feedback session on the mobile app experience", "meeting"),
    # Bugs / incidents
    ("bug_01", "race condition in payment webhook handler, double charges", "bug"),
    ("bug_02", "timezone bug on recurring events during DST transition", "bug"),
    ("bug_03", "memory leak in background worker, OOM every 48 hours", "bug"),
    ("bug_04", "file upload fails silently for files over 100MB", "bug"),
]

assert len(FIXTURE) == 50, f"fixture is {len(FIXTURE)}, want 50"
print(f"Fixture size: {len(FIXTURE)}")
print(f"Unique doc types: {sorted({t for _, _, t in FIXTURE})}")

2. Index with the tfidf backend¶

In [ ]:

Copied!





from swarm.memory.vector_store import VectorStore

store = VectorStore(backend="tfidf")
for doc_id, text, doc_type in FIXTURE:
    await store.add(doc_id, text, metadata={"doc_type": doc_type})

print(f"Store size: {await store.size()}")
print(f"Backend:    {store.backend}")
from swarm.memory.vector_store import VectorStore

store = VectorStore(backend="tfidf")
for doc_id, text, doc_type in FIXTURE:
    await store.add(doc_id, text, metadata={"doc_type": doc_type})

print(f"Store size: {await store.size()}")
print(f"Backend:    {store.backend}")

3. Run a query, inspect as a table¶

In [ ]:

Copied!





import pandas as pd

FIXTURE_TEXT = {doc_id: text for doc_id, text, _ in FIXTURE}

query = "database costs exploded"
results = await store.search(query, k=8)

df = pd.DataFrame(
    [
        {"doc_id": doc_id, "score": round(score, 3), "doc_type": meta["doc_type"], "text": FIXTURE_TEXT[doc_id]}
        for doc_id, score, meta in results
    ]
)
print(f"Query: {query!r}\n")
print(df.to_string(index=False))
import pandas as pd

FIXTURE_TEXT = {doc_id: text for doc_id, text, _ in FIXTURE}

query = "database costs exploded"
results = await store.search(query, k=8)

df = pd.DataFrame(
    [
        {"doc_id": doc_id, "score": round(score, 3), "doc_type": meta["doc_type"], "text": FIXTURE_TEXT[doc_id]}
        for doc_id, score, meta in results
    ]
)
print(f"Query: {query!r}\n")
print(df.to_string(index=False))

The top hit is likely ops_05 (S3 egress charges) or a postgres entry. Notice that ops_01 ("unusual AWS bill") ranks somewhere in the middle even though it is semantically on-topic: tfidf matches on the word "costs" and "database" but misses that "AWS bill" is a paraphrase of "costs". We will see sentence-transformers do better on this in step 6.

4. Metadata filter: only FAQ entries¶

In [ ]:

Copied!





faq_results = await store.search("how do I reset my password", k=5, filter={"doc_type": "faq"})
print("FAQ-only results:")
for doc_id, score, meta in faq_results:
    print(f"  {doc_id:10s} score={score:.3f}  {FIXTURE_TEXT[doc_id]}")
faq_results = await store.search("how do I reset my password", k=5, filter={"doc_type": "faq"})
print("FAQ-only results:")
for doc_id, score, meta in faq_results:
    print(f"  {doc_id:10s} score={score:.3f}  {FIXTURE_TEXT[doc_id]}")

5. Cosine-similarity distribution for a query¶

For one query, score every document and plot the histogram. The tail at the right is the set of docs the agent would actually retrieve. Everything under ~0.1 is noise.

In [ ]:

Copied!





import matplotlib.pyplot as plt

query_for_hist = "database costs exploded"
all_results = await store.search(query_for_hist, k=len(FIXTURE))
scores = [s for _, s, _ in all_results]

fig, ax = plt.subplots(figsize=(8, 4))
ax.hist(scores, bins=20, color="#3b82f6", edgecolor="white")
ax.axvline(0.3, color="#ef4444", linestyle="--", label="noise threshold (0.3)")
ax.set_xlabel("cosine similarity")
ax.set_ylabel("document count")
ax.set_title(f"Score distribution for query: {query_for_hist!r}")
ax.legend()
plt.tight_layout()
plt.show()

above_noise = [s for s in scores if s > 0.3]
print(f"Docs above 0.3: {len(above_noise)} / {len(scores)}")
import matplotlib.pyplot as plt

query_for_hist = "database costs exploded"
all_results = await store.search(query_for_hist, k=len(FIXTURE))
scores = [s for _, s, _ in all_results]

fig, ax = plt.subplots(figsize=(8, 4))
ax.hist(scores, bins=20, color="#3b82f6", edgecolor="white")
ax.axvline(0.3, color="#ef4444", linestyle="--", label="noise threshold (0.3)")
ax.set_xlabel("cosine similarity")
ax.set_ylabel("document count")
ax.set_title(f"Score distribution for query: {query_for_hist!r}")
ax.legend()
plt.tight_layout()
plt.show()

above_noise = [s for s in scores if s > 0.3]
print(f"Docs above 0.3: {len(above_noise)} / {len(scores)}")

6. Why semantic beats keyword on paraphrase¶

Pick a query whose surface form does not overlap the target doc. Compare tfidf (which misses) against sentence-transformers (which catches it), when available.

In [ ]:

Copied!

import importlib.util

HAS_ST = importlib.util.find_spec("sentence_transformers") is not None
print(f"sentence-transformers available: {HAS_ST}")
import importlib.util

HAS_ST = importlib.util.find_spec("sentence_transformers") is not None
print(f"sentence-transformers available: {HAS_ST}")

In [ ]:

Copied!





paraphrase_query = "cloud bill went way up"
target = "ops_01"  # "unusual AWS bill on Sunday"

tfidf_hits = await store.search(paraphrase_query, k=5)
print(f"[tfidf] top-5 for {paraphrase_query!r}:")
for rank, (doc_id, score, _) in enumerate(tfidf_hits, 1):
    marker = " <-- target" if doc_id == target else ""
    print(f"  {rank}. {doc_id:10s} {score:.3f}  {FIXTURE_TEXT[doc_id]}{marker}")
paraphrase_query = "cloud bill went way up"
target = "ops_01"  # "unusual AWS bill on Sunday"

tfidf_hits = await store.search(paraphrase_query, k=5)
print(f"[tfidf] top-5 for {paraphrase_query!r}:")
for rank, (doc_id, score, _) in enumerate(tfidf_hits, 1):
    marker = " <-- target" if doc_id == target else ""
    print(f"  {rank}. {doc_id:10s} {score:.3f}  {FIXTURE_TEXT[doc_id]}{marker}")

In [ ]:

Copied!





if HAS_ST:
    st_store = VectorStore(backend="sentence-transformers")
    for doc_id, text, doc_type in FIXTURE:
        await st_store.add(doc_id, text, metadata={"doc_type": doc_type})
    st_hits = await st_store.search(paraphrase_query, k=5)
    print(f"[sentence-transformers] top-5 for {paraphrase_query!r}:")
    for rank, (doc_id, score, _) in enumerate(st_hits, 1):
        marker = " <-- target" if doc_id == target else ""
        print(f"  {rank}. {doc_id:10s} {score:.3f}  {FIXTURE_TEXT[doc_id]}{marker}")
else:
    print("sentence-transformers not installed; skipping dense comparison.")
    print("To install: pip install sentence-transformers")
    print("The tfidf fallback is what the VectorStore returned above.")
if HAS_ST:
    st_store = VectorStore(backend="sentence-transformers")
    for doc_id, text, doc_type in FIXTURE:
        await st_store.add(doc_id, text, metadata={"doc_type": doc_type})
    st_hits = await st_store.search(paraphrase_query, k=5)
    print(f"[sentence-transformers] top-5 for {paraphrase_query!r}:")
    for rank, (doc_id, score, _) in enumerate(st_hits, 1):
        marker = " <-- target" if doc_id == target else ""
        print(f"  {rank}. {doc_id:10s} {score:.3f}  {FIXTURE_TEXT[doc_id]}{marker}")
else:
    print("sentence-transformers not installed; skipping dense comparison.")
    print("To install: pip install sentence-transformers")
    print("The tfidf fallback is what the VectorStore returned above.")

In mock-mode runs without sentence-transformers, only the tfidf column is populated, and the graceful fallback is exactly the point of the _resolve_backend logic: asking for sentence-transformers on a fresh Colab that has not installed it must not crash.

7. Side-by-side score table (if dense backend available)¶

In [ ]:

Copied!





if HAS_ST:
    comparison_queries = [
        "cloud bill went way up",
        "users logging in from weird places",
        "making the tests run faster",
    ]
    rows = []
    for q in comparison_queries:
        tfidf_top = (await store.search(q, k=1))[0]
        st_top    = (await st_store.search(q, k=1))[0]
        rows.append({
            "query":       q,
            "tfidf_doc":   tfidf_top[0],
            "tfidf_score": round(tfidf_top[1], 3),
            "st_doc":      st_top[0],
            "st_score":    round(st_top[1], 3),
        })
    print(pd.DataFrame(rows).to_string(index=False))
else:
    print("sentence-transformers not installed; skipping comparison table.")
if HAS_ST:
    comparison_queries = [
        "cloud bill went way up",
        "users logging in from weird places",
        "making the tests run faster",
    ]
    rows = []
    for q in comparison_queries:
        tfidf_top = (await store.search(q, k=1))[0]
        st_top    = (await st_store.search(q, k=1))[0]
        rows.append({
            "query":       q,
            "tfidf_doc":   tfidf_top[0],
            "tfidf_score": round(tfidf_top[1], 3),
            "st_doc":      st_top[0],
            "st_score":    round(st_top[1], 3),
        })
    print(pd.DataFrame(rows).to_string(index=False))
else:
    print("sentence-transformers not installed; skipping comparison table.")

Takeaways¶

VectorStore with the tfidf backend is zero-dep and handles the 10K-doc English case well.
Scores below 0.3 are noise; gate your retrieval with a threshold before feeding results into a prompt.
Metadata filters cheaply scope the search to one doc type; set k high enough that the filter has room to keep a reasonable number.
sentence-transformers wins on paraphrase and synonym matches; swap the backend string, nothing else changes.
Duplicate doc_id overwrites in place, so re-indexing on edit is safe and idempotent.

Continue¶

Chapter 5: Evaluation and Observability