Data Analysis with Agentic AI (Part 2 - The Architecture)

Structuring Your Agentic AI workflows for Scale and Success

Jul 17, 2025

Introduction

In Part 1, we unveiled the vision for automating data analysis with agentic AI, and you got a taste of LangGraph's core power through a simple example. You saw how intelligent agents can tackle complex decision-making, moving beyond the limitations of traditional dashboards.

Now, it's time to get practical! You're about to dive deep into the full architecture of our LangGraph-powered system. We'll walk you through its thoughtful project structure, revealing how different components fit together. Get ready to understand the blueprint for building your own robust agentic system!

About the Author:

Arun Subramanian: Arun is an Associate Principal of Analytics & Insights at Amazon Ads, where he leads development and deployment of innovative insights to optimize advertising performance at scale. He has over 12 years of experience and is skilled in crafting strategic analytics roadmap, nurturing talent, collaborating with cross-functional teams, and communicating complex insights to diverse stakeholders.

Project Structure & Modularity

Building a sophisticated agentic system requires a clear, modular structure. This isn't just about neatness; it's about maintainability, scalability, and collaboration. Our project adheres to a well-defined hierarchy, ensuring every component has its place and purpose.

At its core are Agent Components like LLMs, tools, state (aka memory management) and nodes (aka agents) that forms the basic building blocks of the AI’s intelligence and ability to interact with its environment. The Orchestration Layer (graph) ensures that agents can be coordinated and sequenced to accomplish tasks, effectively acting as the conductor for the agent team. The Notebooks directory is where we iterate and test multiple different ideas before settling on the final version. The User Interface (ui) enables a more effective real-time connection and acts as a human-agent interface, taking user inputs as well as displaying agent responses. Supporting all of these efforts are Configuration and Utilities (utils), that houses general helper functions.

Finally, the Project Root hosts essential setup files like .gitignore, README.md and other dependency management files along with the script to execute the project. This structured approach enables developers to focus on specific modules, integrate new technologies seamlessly and maintain a robust, adaptable agentic AI system.

Let's walk through the Langgraph_workflows/src/langgraphagenticai/ directory:

graph/: This is where the "brains" of your workflow reside.
- graph_builder.py: This module is responsible for constructing the entire LangGraph workflow. It defines the nodes, the edges (the flow of data and control), and sets up the overall orchestration.
- graph_executor.py: Once the graph is built, this module takes over, executing the workflow step-by-step, managing agent transitions, and ensuring smooth data flow through the system.
LLMs/: Your AI powerhouses live here.
- groqllm.py: This module handles the integration and specific utilities for interacting with the Groq LLM.
- openllm.py: Similarly, this module provides the necessary integration and utilities for leveraging OpenAI's LLMs.
- Why separate? This modularity allows you to easily swap out or add new LLM providers without disrupting the core workflow logic.
nodes/: These are your specialized AI agents, each with a distinct mission in the data analysis pipeline. (We'll dive into the specifics of each node's mission in Part 3!).
- node_data_profiling.py
- node_data_cleaning.py
- node_univariate_analysis.py
- node_bivariate_analysis.py
- node_final_report.py
state/: The system's memory.
- state.py: This crucial module defines and manages the agent's memory, context, and persistent state throughout the workflow. It ensures that information is consistently passed between nodes and that the system can pick up exactly where it left off, even after interruptions.
tools/: Your agents' hands-on power.
- coding_tool.py: This is a critical component: a secure sandbox environment for executing dynamically generated Python code. It allows our LLM-powered agents to "act" on the data, perform calculations, and generate visualizations without compromising system integrity.
ui/: Your user's gateway.
- streamlitui.py: This module contains the Streamlit UI logic and layout, providing an intuitive and interactive front-end for users to upload data, configure settings, and view results.
- config.py: Handles UI and workflow configuration logic, ensuring flexibility.
- uiconfigfile.ini: A configuration file for UI and workflow settings, allowing for easy adjustments.
utils/: Your helpful assistants.
- clean_directory.py: Utilities for managing and cleaning up temporary directories.
- code_utils.py: Helper functions specifically designed for code generation and execution tasks.
- constants.py: Stores project-wide constants, ensuring consistency across modules.
- encode_image_to_base64.py: A utility for encoding images, often used for displaying visualizations within the UI.
main.py: The central command center. This file orchestrates the UI, LLM selection, file handling, and the entire LangGraph workflow, ensuring a seamless and interactive user experience from raw data to actionable insights.

And finally, Langgraph_workflows/app.py is the user’s ingress point which executes the src/main.py to initialize the entire setup. This thoughtful, modular structure promotes unmatched maintainability and scalability for your project. It's designed to grow with your needs, making it easy to add new analysis types, integrate more tools, or adapt to evolving data challenges.

Conclusion

You've just explored the intricate architectural depth and experienced the power of modularity that makes this agentic workflow so incredibly effective. You now understand the foundational structure that enables specialized AI agents to tackle complex data analysis.

Next up: "Part 3: Agentic Workflow" – where we'll unpack each agent's specific mission, how they use LLMs and tools in detail, and show you the true power of their collaboration!

Demystifying Machine Learning with Arun

Discussion about this post