Data Analyst Agent#

Natural-language-to-SQL-to-chart pipeline. Ask a question in English; get an answer, a chart, and the SQL that produced them.

What it does#

Takes a question like "What are the top 5 customers by total spend?" and produces:

A SQL query against the fixture database
A matplotlib chart saved to ./charts/
A 2-sentence plain-English summary
The SQL itself (so you can sanity-check)

All grounded in a real SQLite database — no hallucinated numbers.

Architecture#

Ch 3a (Agent Loop) — the reason-act-observe cycle
Ch 3b (Tools & MCP) — 6-tool registry: schema introspection, SQL execution, chart rendering, summarization
Ch 5 (Evaluation) — structured output validation ensures the SQL is syntactically correct before execution

Question ──► list_tables + describe_table ──► run_sql ──► make_chart ──► summarize_result
                                                                            │
                                                                            └── explain_query

Tools#

Tool	Purpose
`list_tables()`	Schema: all table names
`describe_table(name)`	Columns, types, 3-row sample
`run_sql(query)`	Execute against `fixtures/sample.db`
`make_chart(data, chart_type, title)`	Save a PNG with matplotlib
`summarize_result(data)`	2-sentence takeaway
`explain_query(sql)`	Human-readable SQL explanation

Fixture database#

fixtures/sample.db is seeded by fixtures/seed.py with three tables:

customers (id, name, email, signup_date, country) — 100 rows
orders (id, customer_id, total, date, status) — 250 rows
products (id, sku, name, category, price) — 50 rows

Run python fixtures/seed.py to regenerate.

Cost estimate#

~$0.02 per query on a Haiku+Sonnet mix (Haiku for schema, Sonnet for SQL generation).

Run it#

SWARM_MOCK=true python -m projects.data_analyst_agent.agent \
    "What are the top 5 customers by total spend?"

Expected: the agent prints the SQL, runs it, saves a bar chart to charts/, and prints a summary.

SWARM_MOCK=true .venv/bin/pytest projects/data_analyst_agent/ -v

Expected: 10 tests passing (simple aggregation, joins, filters, ambiguous questions handled gracefully, unknown tables rejected).

Sample questions#

sample_questions.md lists 10 natural-language queries the agent handles well. Good starter set for testing your extensions.

Extending for production#

Swap SQLite for your real analytics DB (Postgres, Snowflake, BigQuery). Only run_sql needs to change.
Add a permission layer: not every query should be allowed on production data (no DELETE, no DROP, row-limit caps)
Cache SQL + chart output — common questions will repeat
Integrate with your BI tool: render charts as Superset/Metabase embeds instead of local PNGs
Track query cost in the hook bus (Ch07) and alert on expensive query patterns