Skip to content

Data Analyst Agent#

Natural-language-to-SQL-to-chart pipeline. Ask a question in English; get an answer, a chart, and the SQL that produced them.

What it does#

Takes a question like "What are the top 5 customers by total spend?" and produces:

  1. A SQL query against the fixture database
  2. A matplotlib chart saved to ./charts/
  3. A 2-sentence plain-English summary
  4. The SQL itself (so you can sanity-check)

All grounded in a real SQLite database — no hallucinated numbers.

Architecture#

  • Ch 3a (Agent Loop) — the reason-act-observe cycle
  • Ch 3b (Tools & MCP) — 6-tool registry: schema introspection, SQL execution, chart rendering, summarization
  • Ch 5 (Evaluation) — structured output validation ensures the SQL is syntactically correct before execution
Question ──► list_tables + describe_table ──► run_sql ──► make_chart ──► summarize_result
                                                                            └── explain_query

Tools#

Tool Purpose
list_tables() Schema: all table names
describe_table(name) Columns, types, 3-row sample
run_sql(query) Execute against fixtures/sample.db
make_chart(data, chart_type, title) Save a PNG with matplotlib
summarize_result(data) 2-sentence takeaway
explain_query(sql) Human-readable SQL explanation

Fixture database#

fixtures/sample.db is seeded by fixtures/seed.py with three tables:

  • customers (id, name, email, signup_date, country) — 100 rows
  • orders (id, customer_id, total, date, status) — 250 rows
  • products (id, sku, name, category, price) — 50 rows

Run python fixtures/seed.py to regenerate.

Cost estimate#

~$0.02 per query on a Haiku+Sonnet mix (Haiku for schema, Sonnet for SQL generation).

Run it#

SWARM_MOCK=true python -m projects.data_analyst_agent.agent \
    "What are the top 5 customers by total spend?"

Expected: the agent prints the SQL, runs it, saves a bar chart to charts/, and prints a summary.

SWARM_MOCK=true .venv/bin/pytest projects/data_analyst_agent/ -v

Expected: 10 tests passing (simple aggregation, joins, filters, ambiguous questions handled gracefully, unknown tables rejected).

Sample questions#

sample_questions.md lists 10 natural-language queries the agent handles well. Good starter set for testing your extensions.

Extending for production#

  • Swap SQLite for your real analytics DB (Postgres, Snowflake, BigQuery). Only run_sql needs to change.
  • Add a permission layer: not every query should be allowed on production data (no DELETE, no DROP, row-limit caps)
  • Cache SQL + chart output — common questions will repeat
  • Integrate with your BI tool: render charts as Superset/Metabase embeds instead of local PNGs
  • Track query cost in the hook bus (Ch07) and alert on expensive query patterns