๐ฃ๏ธ Just ask, get answers! Let AI Agents handle the SQL ๐ค
This post dives into how AI agents can revolutionize your data interaction by translating plain English questions directly into SQL queries and answers.
๐ Introduction
Imagine being able to ask any question to your SQL database in plain English and instantly get the answers you need โ no complex SQL queries required! That's the magic of AI agents, and it's changing the game for data access.
I've built a Streamlit app that shows you this in action (you can check it out here!) with a sample database. It's all about how AI agents are tearing down the walls between you and your data.
๐๐ก๐ฒ ๐ฌ๐ก๐จ๐ฎ๐ฅ๐ ๐ฒ๐จ๐ฎ ๐๐๐ซ๐?
โ๏ธ ๐๐๐ ๐๐๐ ๐๐๐๐๐๐๐ ๐๐๐๐๐๐๐: Forget waiting on reports. Get the info you need, now.
โ๏ธ ๐ซ๐๐๐ ๐๐๐๐๐๐๐ ๐๐๐๐๐๐๐๐'๐ ๐๐๐๐๐๐๐๐๐๐ : No SQL skills? No problem. If you can ask a question, you can get an answer.
โ๏ธ ๐บ๐๐๐๐๐๐ ๐ ๐๐๐๐๐๐๐๐, ๐๐๐๐๐๐: AI-powered insights mean you can make data-driven choices at lightning speed.
๐๐ก๐ข๐ง๐ค ๐จ๐ ๐ญ๐ก๐ ๐ฉ๐จ๐ฌ๐ฌ๐ข๐๐ข๐ฅ๐ข๐ญ๐ข๐๐ฌ:
Marketing teams can quickly understand customer trends.
Sales teams can easily track performance metrics.
Business leaders can get instant answers to critical questions.
The future of data interaction is intuitive and accessible. AI agents are making it a reality, allowing us to focus on the questions that matter, not the code required to answer them.
Note - For those seeking to build this application themselves after reading the post, you can find the complete code in this GitHub repo.
About the Author:
Arun Subramanian: Arun is an Associate Principal of Analytics & Insights at Amazon Ads, where he leads development and deployment of innovative insights to optimize advertising performance at scale. He has over 12 years of experience and is skilled in crafting strategic analytics roadmap, nurturing talent, collaborating with cross-functional teams, and communicating complex insights to diverse stakeholders.
๐๏ธ High-Level Architecture
So, how does this text-to-SQL wizardry actually work? At its core, the application combines several powerful components:
User Interface (Streamlit): This is what you see and interact with โ a friendly web app where you type your questions.
Database (SQLite - Northwind): A sample SQL database holds the information we want to query. More on this treasure trove of data below!
Large Language Model (OpenAI's GPT-3.5-turbo): The brain of the operation! It understands your natural language questions and helps generate the SQL queries.
SQL Toolkit (Langchain): This is the agent's connection to the database, providing a suite of specialized tools. This toolkit comprises of a)
sql_db_list_tables
- to list all tables available in the connected database, b)sql_db_schema
- to retrieve the table schema and sample rows for a specified list of tables, c)sql_db_query
- to execute a given SQL query against the database and, d)sql_db_query_checker
- to validate the syntax and correctness of an sql query before execution.AI Agent (LangGraph - ReAct): The conductor of the orchestra. This agent, built using LangGraph, uses the ReAct (Reason + Act) framework. It reasons about your question, decides what tools to use, uses them, observes the results, and then decides the next step until it arrives at your answer.
Spotlight on: The Northwind Database ๐๏ธ
The Northwind database is a classic sample database originally created by Microsoft. It's been a go-to for tutorials for decades because it beautifully represents the data of a fictitious specialty food import/export company called "Northwind Traders."
Think of it as a mini-ERP system, covering:
Suppliers: Who provides the goods.
Customers: Who buys the products.
Employees: The hardworking team at Northwind.
Products: All the specialty foods.
Shippers: How products get from A to B.
Orders and Order Details: The nitty-gritty of sales transactions.
The Northwind dataset is rich with information, typically including 14 tables. These tables are interconnected, showing relationships between customers and their orders, products and their suppliers, and so on. This relational structure is what makes querying it for insights so valuable (and fun for an AI agent!).
Image Reference: https://docs.yugabyte.com/images/sample-data/northwind/northwind-er-diagram.png
โ๏ธ How It Works: A User's Journey
Let's walk through what happens when you ask the app a question:
You Ask: You type your question in plain English into the Streamlit app's chat interface, for example, "Which customers are located in London?"
API Keys & Setup: Before you start, you'll provide your OpenAI API key (to access the LLM) and a LangSmith API key (to access the prebuilt ReAct Agent from Hub). The app uses these to initialize the core components: the LLM, ReAct Agent, the database connection, and the SQL toolkit.
Agent Takes Over: Your question is passed to the AI agent.
Understanding the Goal: The agent, powered by the LLM and a specialized system prompt, first understands what you're trying to find out.
Tooling Up: It consults its SQL toolkit. It might first use the
sql_db_list_tables
to see all available tables. Then, it might usesql_db_schema
on relevant tables (likeCustomers
) to understand their columns. This is like a detective figuring out which files to look into and what information they contain.Generating SQL: Based on your question and its understanding of the database structure, the LLM within the agent crafts the appropriate SQL query. For our example, it might generate something like:
SELECT * FROM Customers WHERE City = 'London';
Checking the SQL (Optional but Smart!): The agent might use the
sql_db_query_checker
to validate this SQL query before execution.Executing the Query: The agent then uses the
sql_db_query
to run this (potentially checked) SQL query against the Northwind database.Getting the Results: The database returns the raw data (e.g., a list of customer IDs and names).
Delivering the Answer: The agent doesn't just dump raw data on you. The LLM helps formulate a natural language response based on the query results. So, instead of just a table, you'll see a clear answer like, "The customers located in London are..." followed by the list.
See the Steps (Transparency!): Throughout this process, the Streamlit app shows you the agent's "thoughts" and actions โ which tools it's calling (e.g.,
InfoSQLDatabaseTool
with input 'Customers'), the SQL it generates, and the results it gets. This transparency is fantastic for understanding how the agent arrives at the answer and for building trust.
๐Verifying Results with DB Browser
Once the AI agent has done its work, you might be wondering how to quickly double-check the accuracy of the SQL query it generated and the results it retrieved. This is where DB Browser for SQLite comes in handy.
DB Browser is a free, open-source tool that allows you to easily manage and query SQLite databases. Since the Northwind database is in a SQLite format, DB Browser provides a straightforward way to confirm the AI's output.
Here's how you can use it:
Open the Northwind Database: Launch DB Browser for SQLite and open your Northwind database file.
Navigate to the "Execute SQL" Tab: In the DB Browser interface, there's a dedicated tab for executing SQL queries.
Paste and Run the Query: Copy the exact SQL query that the AI agent generated and paste it into the SQL editor pane within DB Browser. Then, simply click the "Execute SQL" button.
Compare the Results: DB Browser will display the results of your query in a clear, tabular format. You can then directly compare these results with the data retrieved by your AI application. This quick visual check helps ensure that the AI's translation of your natural language into SQL is accurate and that the correct data is being fetched.
This verification step is a great way to build confidence in your AI agent's performance and to understand how it's interpreting your requests.
๐ค Bringing Everything Together: The Symphony of AI
The beauty of this application lies in how these individual pieces โ the intuitive Streamlit UI, the structured Northwind SQL database, the intelligent LLM, the versatile Langchain SQL toolkit, and the reasoning power of the LangGraph agent โ all come together.
It's a shift from you needing to know the database's language (SQL) to the AI agent learning to understand your language and then acting as your expert translator and data retriever. The agent isn't just blindly converting words to SQL; it's engaging in a multi-step reasoning process, much like a human data analyst would, but at incredible speed.
โจ Conclusion: The Future is Conversational
This text-to-SQL Streamlit app is more than just a cool demo; it's a glimpse into a future where interacting with complex data systems becomes as easy as having a conversation. By leveraging the power of AI agents, we can:
Democratize Data Access: Empowering anyone, regardless of their technical skills, to explore and understand data.
Boost Productivity: Getting insights faster means making decisions quicker and more effectively.
Unlock New Possibilities: When data interaction is this fluid, it opens up new avenues for discovery and innovation.
The journey of AI agents in data analytics is just beginning, and it promises to make our data-driven world more accessible and intuitive than ever before. So, go ahead, just ask your data!