🗣️ Just ask, get answers! Let AI Agents handle the SQL 🤖

This post dives into how AI agents can revolutionize your data interaction by translating plain English questions directly into SQL queries and answers.

May 29, 2025

👋 Introduction

Imagine being able to ask any question to your SQL database in plain English and instantly get the answers you need – no complex SQL queries required! That's the magic of AI agents, and it's changing the game for data access.

I've built a Streamlit app that shows you this in action (you can check it out here!) with a sample database. It's all about how AI agents are tearing down the walls between you and your data.

𝐖𝐡𝐲 𝐬𝐡𝐨𝐮𝐥𝐝 𝐲𝐨𝐮 𝐜𝐚𝐫𝐞?

❄️ 𝒀𝒐𝒖 𝒈𝒆𝒕 𝒊𝒏𝒔𝒕𝒂𝒏𝒕 𝒂𝒏𝒔𝒘𝒆𝒓𝒔: Forget waiting on reports. Get the info you need, now.
❄️ 𝑫𝒂𝒕𝒂 𝒃𝒆𝒄𝒐𝒎𝒆𝒔 𝒆𝒗𝒆𝒓𝒚𝒐𝒏𝒆'𝒔 𝒑𝒍𝒂𝒚𝒈𝒓𝒐𝒖𝒏𝒅: No SQL skills? No problem. If you can ask a question, you can get an answer.
❄️ 𝑺𝒎𝒂𝒓𝒕𝒆𝒓 𝒅𝒆𝒄𝒊𝒔𝒊𝒐𝒏𝒔, 𝒇𝒂𝒔𝒕𝒆𝒓: AI-powered insights mean you can make data-driven choices at lightning speed.

𝐓𝐡𝐢𝐧𝐤 𝐨𝐟 𝐭𝐡𝐞 𝐩𝐨𝐬𝐬𝐢𝐛𝐢𝐥𝐢𝐭𝐢𝐞𝐬:

Marketing teams can quickly understand customer trends.
Sales teams can easily track performance metrics.
Business leaders can get instant answers to critical questions.

The future of data interaction is intuitive and accessible. AI agents are making it a reality, allowing us to focus on the questions that matter, not the code required to answer them.

Note - For those seeking to build this application themselves after reading the post, you can find the complete code in this GitHub repo.

About the Author:

Arun Subramanian: Arun is an Associate Principal of Analytics & Insights at Amazon Ads, where he leads development and deployment of innovative insights to optimize advertising performance at scale. He has over 12 years of experience and is skilled in crafting strategic analytics roadmap, nurturing talent, collaborating with cross-functional teams, and communicating complex insights to diverse stakeholders.

🏛️ High-Level Architecture

So, how does this text-to-SQL wizardry actually work? At its core, the application combines several powerful components:

User Interface (Streamlit): This is what you see and interact with – a friendly web app where you type your questions.
Database (SQLite - Northwind): A sample SQL database holds the information we want to query. More on this treasure trove of data below!
Large Language Model (OpenAI's GPT-3.5-turbo): The brain of the operation! It understands your natural language questions and helps generate the SQL queries.
SQL Toolkit (Langchain): This is the agent's connection to the database, providing a suite of specialized tools. This toolkit comprises of a) sql_db_list_tables - to list all tables available in the connected database, b) sql_db_schema - to retrieve the table schema and sample rows for a specified list of tables, c) sql_db_query - to execute a given SQL query against the database and, d) sql_db_query_checker - to validate the syntax and correctness of an sql query before execution.
AI Agent (LangGraph - ReAct): The conductor of the orchestra. This agent, built using LangGraph, uses the ReAct (Reason + Act) framework. It reasons about your question, decides what tools to use, uses them, observes the results, and then decides the next step until it arrives at your answer.

Spotlight on: The Northwind Database 🛍️

The Northwind database is a classic sample database originally created by Microsoft. It's been a go-to for tutorials for decades because it beautifully represents the data of a fictitious specialty food import/export company called "Northwind Traders."

Think of it as a mini-ERP system, covering:

Suppliers: Who provides the goods.
Customers: Who buys the products.
Employees: The hardworking team at Northwind.
Products: All the specialty foods.
Shippers: How products get from A to B.
Orders and Order Details: The nitty-gritty of sales transactions.

The Northwind dataset is rich with information, typically including 14 tables. These tables are interconnected, showing relationships between customers and their orders, products and their suppliers, and so on. This relational structure is what makes querying it for insights so valuable (and fun for an AI agent!).

Image Reference: https://docs.yugabyte.com/images/sample-data/northwind/northwind-er-diagram.png

⚙️ How It Works: A User's Journey

Let's walk through what happens when you ask the app a question:

You Ask: You type your question in plain English into the Streamlit app's chat interface, for example, "Which customers are located in London?"
API Keys & Setup: Before you start, you'll provide your OpenAI API key (to access the LLM) and a LangSmith API key (to access the prebuilt ReAct Agent from Hub). The app uses these to initialize the core components: the LLM, ReAct Agent, the database connection, and the SQL toolkit.
Agent Takes Over: Your question is passed to the AI agent.
- Understanding the Goal: The agent, powered by the LLM and a specialized system prompt, first understands what you're trying to find out.
- Tooling Up: It consults its SQL toolkit. It might first use the sql_db_list_tables to see all available tables. Then, it might use sql_db_schema on relevant tables (like Customers) to understand their columns. This is like a detective figuring out which files to look into and what information they contain.
- Generating SQL: Based on your question and its understanding of the database structure, the LLM within the agent crafts the appropriate SQL query. For our example, it might generate something like: SELECT * FROM Customers WHERE City = 'London';
- Checking the SQL (Optional but Smart!): The agent might use the sql_db_query_checker to validate this SQL query before execution.
- Executing the Query: The agent then uses the sql_db_query to run this (potentially checked) SQL query against the Northwind database.
- Getting the Results: The database returns the raw data (e.g., a list of customer IDs and names).
Delivering the Answer: The agent doesn't just dump raw data on you. The LLM helps formulate a natural language response based on the query results. So, instead of just a table, you'll see a clear answer like, "The customers located in London are..." followed by the list.
See the Steps (Transparency!): Throughout this process, the Streamlit app shows you the agent's "thoughts" and actions – which tools it's calling (e.g., InfoSQLDatabaseTool with input 'Customers'), the SQL it generates, and the results it gets. This transparency is fantastic for understanding how the agent arrives at the answer and for building trust.

🔍Verifying Results with DB Browser

Once the AI agent has done its work, you might be wondering how to quickly double-check the accuracy of the SQL query it generated and the results it retrieved. This is where DB Browser for SQLite comes in handy.

DB Browser is a free, open-source tool that allows you to easily manage and query SQLite databases. Since the Northwind database is in a SQLite format, DB Browser provides a straightforward way to confirm the AI's output.

Here's how you can use it:

Open the Northwind Database: Launch DB Browser for SQLite and open your Northwind database file.
Navigate to the "Execute SQL" Tab: In the DB Browser interface, there's a dedicated tab for executing SQL queries.
Paste and Run the Query: Copy the exact SQL query that the AI agent generated and paste it into the SQL editor pane within DB Browser. Then, simply click the "Execute SQL" button.
Compare the Results: DB Browser will display the results of your query in a clear, tabular format. You can then directly compare these results with the data retrieved by your AI application. This quick visual check helps ensure that the AI's translation of your natural language into SQL is accurate and that the correct data is being fetched.

This verification step is a great way to build confidence in your AI agent's performance and to understand how it's interpreting your requests.

🤝 Bringing Everything Together: The Symphony of AI

The beauty of this application lies in how these individual pieces – the intuitive Streamlit UI, the structured Northwind SQL database, the intelligent LLM, the versatile Langchain SQL toolkit, and the reasoning power of the LangGraph agent – all come together.

It's a shift from you needing to know the database's language (SQL) to the AI agent learning to understand your language and then acting as your expert translator and data retriever. The agent isn't just blindly converting words to SQL; it's engaging in a multi-step reasoning process, much like a human data analyst would, but at incredible speed.

✨ Conclusion: The Future is Conversational

This text-to-SQL Streamlit app is more than just a cool demo; it's a glimpse into a future where interacting with complex data systems becomes as easy as having a conversation. By leveraging the power of AI agents, we can:

Democratize Data Access: Empowering anyone, regardless of their technical skills, to explore and understand data.
Boost Productivity: Getting insights faster means making decisions quicker and more effectively.
Unlock New Possibilities: When data interaction is this fluid, it opens up new avenues for discovery and innovation.

The journey of AI agents in data analytics is just beginning, and it promises to make our data-driven world more accessible and intuitive than ever before. So, go ahead, just ask your data!

Demystifying Machine Learning with Arun

Discussion about this post