Data Analyst Agent#
Natural-language-to-SQL-to-chart pipeline. Ask a question in English; get an answer, a chart, and the SQL that produced them.
What it does#
Takes a question like "What are the top 5 customers by total spend?" and produces:
- A SQL query against the fixture database
- A matplotlib chart saved to
./charts/ - A 2-sentence plain-English summary
- The SQL itself (so you can sanity-check)
All grounded in a real SQLite database — no hallucinated numbers.
Architecture#
- Ch 3a (Agent Loop) — the reason-act-observe cycle
- Ch 3b (Tools & MCP) — 6-tool registry: schema introspection, SQL execution, chart rendering, summarization
- Ch 5 (Evaluation) — structured output validation ensures the SQL is syntactically correct before execution
Question ──► list_tables + describe_table ──► run_sql ──► make_chart ──► summarize_result
│
└── explain_query
Tools#
| Tool | Purpose |
|---|---|
list_tables() |
Schema: all table names |
describe_table(name) |
Columns, types, 3-row sample |
run_sql(query) |
Execute against fixtures/sample.db |
make_chart(data, chart_type, title) |
Save a PNG with matplotlib |
summarize_result(data) |
2-sentence takeaway |
explain_query(sql) |
Human-readable SQL explanation |
Fixture database#
fixtures/sample.db is seeded by fixtures/seed.py with three tables:
customers(id, name, email, signup_date, country) — 100 rowsorders(id, customer_id, total, date, status) — 250 rowsproducts(id, sku, name, category, price) — 50 rows
Run python fixtures/seed.py to regenerate.
Cost estimate#
~$0.02 per query on a Haiku+Sonnet mix (Haiku for schema, Sonnet for SQL generation).
Run it#
SWARM_MOCK=true python -m projects.data_analyst_agent.agent \
"What are the top 5 customers by total spend?"
Expected: the agent prints the SQL, runs it, saves a bar chart to charts/, and prints a summary.
Expected: 10 tests passing (simple aggregation, joins, filters, ambiguous questions handled gracefully, unknown tables rejected).
Sample questions#
sample_questions.md lists 10 natural-language queries the agent handles well. Good starter set for testing your extensions.
Extending for production#
- Swap SQLite for your real analytics DB (Postgres, Snowflake, BigQuery). Only
run_sqlneeds to change. - Add a permission layer: not every query should be allowed on production data (no DELETE, no DROP, row-limit caps)
- Cache SQL + chart output — common questions will repeat
- Integrate with your BI tool: render charts as Superset/Metabase embeds instead of local PNGs
- Track query cost in the hook bus (Ch07) and alert on expensive query patterns