Build LangGraph AI Agent with tool calling and memory

Discover how to construct sophisticated AI agents using LangGraph by integrating tool calling for enhanced functionality and memory capabilities for context-aware conversations.

May 06, 2025

Introduction

Ever felt like your AI agent was stuck in Groundhog Day, forgetting past interactions and unable to reach beyond its basic knowledge? What if you could equip your AI with the ability to not only remember past conversations but also actively use external tools to accomplish real-world tasks? Buckle up, because LangGraph is here to revolutionize how you build intelligent, dynamic agents.

In this article, you'll discover how LangGraph empowers you to create AI agents that are more than just chatbots. You'll learn step-by-step how to give your agents:

Short-Term Memory: Enable your agents to recall previous turns in a conversation, leading to more context-aware and personalized interactions.
Powerful Tool Calling: Equip your agents with the ability to interact with external tools – think search engines, databases, APIs – to gather information, perform actions, and provide richer responses.

Forget static, one-off interactions. LangGraph lets you build conversational flows where your AI learns from the past and actively uses tools to solve problems. This isn't just about making your AI smarter; it's about making it more useful and more engaging for your users.

Note - For those seeking to build this application themselves after reading the post, you can find the complete code in this GitHub repo.

About the Author:

Arun Subramanian: Arun is an Associate Principal of Analytics & Insights at Amazon Ads, where he leads development and deployment of innovative insights to optimize advertising performance at scale. He has over 12 years of experience and is skilled in crafting strategic analytics roadmap, nurturing talent, collaborating with cross-functional teams, and communicating complex insights to diverse stakeholders.

High level Architecture

As with our previous examples, our LangGraph agent follows a familiar two-tier architecture.

Frontend: This is the user-facing layer, the part you directly interact with. In our case, we've chosen Streamlit for its rapid development capabilities and intuitive interface, allowing you to have a conversational UI up and running with minimal code.
Backend: This is the engine room where the heavy lifting happens. Here, we leverage FastAPI to create a robust and efficient API that handles user requests and orchestrates the LangGraph. LangGraph itself provides the framework for building our AI agent's conversational flow, integrating memory and tool usage.

The frontend sends user input to the backend, which processes it using LangGraph. LangGraph manages the conversation history and decides when and how to use external tools. The backend then sends the AI's response back to the frontend for display to the user.

Backend - FastAPI and LangGraph

Let's delve into the heart of our intelligent agent: the backend powered by FastAPI and LangGraph.

Short-term memory

Short-term memory lets your application remember previous interactions within a single thread or conversation. A thread organizes multiple interactions in a session, similar to the way email groups messages in a single conversation.

LangGraph manages short-term memory as part of the agent's state, persisted via thread-scoped checkpoints. This state can normally include the conversation history along with other stateful data, such as uploaded files, retrieved documents, or generated artifacts. By storing these in the graph's state, the bot can access the full context for a given conversation while maintaining separation between different threads.

# Define the memory checkpointer
memory = MemorySaver()

# Assign it while compiling the graph
graph = graph_builder.compile(checkpointer=memory)

Reference Image: https://langchain-ai.github.io/langgraph/concepts/img/memory/short-vs-long.png

Tool calling

To handle queries our chatbot can't answer directly from its pre-trained data, we'll integrate a web search tool. Our bot can use this tool to find relevant information and provide better responses. First, we will install the requirements to use the Tavily Search Engine, and set your TAVILY_API_KEY. Next, we will define the tools (internet search tool, llm search tool) as shown in the below code block. The results are page summaries our chat bot can use to answer questions.

@tool
def internet_search(query: str):
    """
    Search the web for real-time and latest information.
    for examples, news, stock market, weather updates etc.
    
    Args:
    query: The search query
    """
    search = TavilySearchResults(
        max_results=2
    )

    response = search.invoke(query)

    return response

@tool
def llm_search(query: str):
    """
    Use the LLM model for general and basic information.
    """
    response = llm.invoke(query)
    return response

tools = [internet_search, llm_search]

llm_with_tools = llm.bind_tools(tools)

`/chat` Endpoint

Each new user message is directly passed to the LangGraph's ainvoke method along with the thread_id in the config. The LangGraph, with its integrated MemorySaver, is responsible for managing the conversation history based on this thread_id. After receiving an user message, LLM autonomously decides which tool to use depending on the query. Then the response generated is parsed and sent to the frontend.

@app.post("/chat")
async def chat_endpoint(user_input: UserInput):
    """
    Endpoint to receive user input and process it with the LangGraph.
    """
    thread_id = user_input.thread_id
    config = {"configurable": {"thread_id": thread_id}}
    inputs = {"messages": [user_input.message]}
    results = await graph.ainvoke(inputs, config)
    return {"response": results["messages"][-1].content}

Frontend - Streamlit

Streamlit provides a user friendly interface where you enter your thread_id and your chat message. Then Streamlit handles sending this information to the FastAPI backend and displaying the agent’s response. The thread_id is key to maintaining the conversational context.

# Function to get response from the server
def get_openai_response(message, thread_id):
    json_body={"message": message, "thread_id": thread_id}
    headers = {'Content-Type': 'application/json'}  # Add Content-Type header
    response=requests.post("http://127.0.0.1:8000/chat", json=json_body, headers=headers)

    return response.json()

## Streamlit app
st.set_page_config(page_title="LangGraph AI Agent Chat", page_icon="💬", layout="wide")
st.title("LangGraph AI Agent Chat")

with st.sidebar:
    thread_id = st.text_input("Enter a user session id", value="default")

message = st.chat_input("Enter your question ")


if message and thread_id:
    with st.spinner("Generating answer..."):
        answer_data = get_openai_response(message, thread_id)

    st.write(answer_data["response"])

Testing the App - See the Magic in Action!

Curious to see this LangGraph agent flex its memory and tool-calling muscles? Let's dive into how you can experience the power firsthand!

Fire up both the FastAPI backend (server.py) and the Streamlit frontend (client.py) – you're about to witness some AI wizardry.

Check out this quick video I put together. You'll see me interacting with the agent using two separate "identities" – think of them as different conversations with distinct memories. I created one thread with the ID "arun," feeding it info about myself and my newsletter. Then, I created another with the ID "john," this time sharing details about the one and only John Cena!

What's truly mind-blowing is how the AI agent seamlessly switches context between these threads. Watch how "arun" receives personalized responses related to my interests and newsletter, while "john" gets answers tailored to the world of wrestling.

For those who love to peek behind the curtain, LangSmith provides incredible insights into the AI's thought process. You can trace exactly how the Large Language Model (LLM) reasons and strategically uses the tools at its disposal.

For instance, when I gave the agent information about my newsletter and then asked it to find my recent articles, you can see in the LangSmith trace how it intelligently performed a web search. It correctly identified the name of my newsletter and even found some articles (though, admittedly, not the very recent ones).

Conclusion

Ready to experiment? Spin up the app and try creating your own "threads" with different profiles or topics. See how the agent personalizes its responses and utilizes the tools. Share your cool findings in the comments below – I'd love to hear what you discover!

While LangGraph makes it easy to handle conversation history state and short-term memory, long conversations pose a challenge still. The full history may not even fit inside an LLM’s context window, resulting in error. Techniques like summarization, selective memory retrieval, and context window management are still important considerations for building robust and scalable conversational AI applications with LangGraph. If you are interested, you can find more info here.

Further, in production environments, the in-memory management handled by MemorySaver would typically be replaced with SqliteSaver or PostgresSaver to ensure data durability.

Our next article will explore the exciting use case of building a Text-to-SQL agent with LangGraph. You'll see how the tool calling capabilities can be applied to interact with databases, enabling natural language queries to be translated into actionable SQL commands. Stay tuned to witness the practical application of LangGraph's power!

Demystifying Machine Learning with Arun

Discussion about this post