AI Engineering

How to Build a Production-Ready AI Agent with LangGraph in 2026

Praveen JhaMay 15, 202616 min read

Quick Answer

LangGraph is a framework built on LangChain that enables stateful, multi-actor AI workflows using directed graphs. Unlike single-pass LLM calls, a LangGraph agent maintains state across steps, supports conditional branching, and can call tools, retry on failure, and loop until a stopping condition is met. A production-ready agent requires: typed state, error handling nodes, a human-in-the-loop checkpoint, and persistent memory via a checkpointer.

Most LangGraph tutorials get you to a working agent in 50 lines of code. Then you try to deploy it and realize you have no idea how to handle a failed tool call, where conversation memory lives, or why the agent loops infinitely when the LLM returns unexpected output.

This is the guide that covers what comes after the tutorial — the patterns, the failure modes, and the production architecture decisions.

What LangGraph Actually Is

LangGraph is a stateful graph execution framework for AI agents. It models an agent's decision-making process as a directed graph where nodes are functions (LLM calls, tool invocations, conditional logic) and edges are the transitions between them — including conditional edges that route based on the previous node's output.

The critical difference from LangChain chains: LangGraph supports cycles. An agent can call a tool, evaluate the result, decide to call a different tool, and loop back — until a stopping condition is met. This is what makes LangGraph suitable for real agentic behavior, not just pipeline automation.

Three core concepts:

State: A typed Python class (TypedDict or Pydantic BaseModel) that flows through every node. Nodes receive state, modify it, and return the updated state.
Nodes: Python functions that accept state and return state updates. Can be LLM calls, tool calls, conditional logic, or human checkpoints.
Edges: Connections between nodes. Normal edges are unconditional. Conditional edges call a router function that returns the name of the next node based on current state.

The Minimal Production-Ready Structure

Most tutorials build a single-node agent. Production agents need at minimum:

Typed state class — not a dict, a TypedDict with explicit fields
Tool node — handles tool dispatch and error catching
Error handling node — catches tool failures and decides: retry, skip, or escalate
Evaluator node — validates LLM output before returning it (optional but critical for regulated domains)
Checkpointer — persists state across sessions
Timeout — prevents infinite loops from LLM indecision

Here is the skeleton:

from typing import TypedDict, Annotated, Literal
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.sqlite import SqliteSaver
from langchain_core.messages import BaseMessage, HumanMessage
from langchain_openai import ChatOpenAI
import operator

# 1. Typed state — explicit fields, not a generic dict
class AgentState(TypedDict):
    messages: Annotated[list[BaseMessage], operator.add]
    tool_calls_attempted: int
    last_error: str | None
    output_validated: bool

# 2. LLM with bound tools
llm = ChatOpenAI(model="gpt-4o", temperature=0)
llm_with_tools = llm.bind_tools(tools)

# 3. Core agent node
def agent_node(state: AgentState) -> AgentState:
    response = llm_with_tools.invoke(state["messages"])
    return {"messages": [response], "last_error": None}

# 4. Tool node with error catching
def tool_node(state: AgentState) -> AgentState:
    last_message = state["messages"][-1]
    tool_calls = last_message.tool_calls
    results = []

    for tool_call in tool_calls:
        try:
            tool = tool_registry[tool_call["name"]]
            result = tool.invoke(tool_call["args"])
            results.append(ToolMessage(content=str(result), tool_call_id=tool_call["id"]))
        except Exception as e:
            # Return error as tool result — don't crash the graph
            results.append(ToolMessage(
                content=f"ERROR: {str(e)}",
                tool_call_id=tool_call["id"]
            ))
            return {"messages": results, "last_error": str(e), "tool_calls_attempted": state["tool_calls_attempted"] + 1}

    return {"messages": results, "tool_calls_attempted": state["tool_calls_attempted"] + 1}

# 5. Router — where does the agent go next?
def route_after_agent(state: AgentState) -> Literal["tools", "validate", "end"]:
    last_message = state["messages"][-1]

    # Circuit breaker: stop after 10 tool calls
    if state["tool_calls_attempted"] >= 10:
        return "end"

    if hasattr(last_message, "tool_calls") and last_message.tool_calls:
        return "tools"

    return "validate"

# 6. Evaluator node — validates output before returning
def evaluator_node(state: AgentState) -> AgentState:
    last_message = state["messages"][-1]

    # Domain-specific validation — customize this
    if not last_message.content or len(last_message.content) < 10:
        # Output too short — likely a failure
        return {
            "messages": [HumanMessage(content="Your response was incomplete. Please provide a complete answer.")],
            "output_validated": False
        }

    return {"output_validated": True}

def route_after_evaluate(state: AgentState) -> Literal["agent", "end"]:
    if state.get("output_validated"):
        return "end"
    return "agent"  # Loop back to regenerate

# 7. Build the graph
def build_agent(checkpointer=None) -> StateGraph:
    workflow = StateGraph(AgentState)

    workflow.add_node("agent", agent_node)
    workflow.add_node("tools", tool_node)
    workflow.add_node("validate", evaluator_node)

    workflow.set_entry_point("agent")

    workflow.add_conditional_edges("agent", route_after_agent, {
        "tools": "tools",
        "validate": "validate",
        "end": END,
    })
    workflow.add_edge("tools", "agent")
    workflow.add_conditional_edges("validate", route_after_evaluate, {
        "agent": "agent",
        "end": END,
    })

    return workflow.compile(checkpointer=checkpointer)

# 8. Persistent memory via SQLite checkpointer
with SqliteSaver.from_conn_string("agent_memory.db") as checkpointer:
    agent = build_agent(checkpointer=checkpointer)

State Management: The Most Common Mistake

The single most common mistake in LangGraph agents we review: using a dict instead of a typed state class.

# Wrong — no type safety, no IDE support, silent failures
state = {"messages": [], "count": 0}

# Right — explicit types, Pydantic validation, clear contracts
class AgentState(TypedDict):
    messages: Annotated[list[BaseMessage], operator.add]
    tool_calls_attempted: int
    last_error: str | None
    retrieval_context: list[str]
    output_validated: bool

The Annotated[list[BaseMessage], operator.add] pattern is LangGraph-specific: it tells the framework to merge (add) new messages into the list rather than overwrite the entire list. Without this, every node that returns a messages update would erase the conversation history.

Design principle: Every piece of information the agent needs to make routing decisions should be explicit state fields — not buried in message content that requires parsing.

The Circuit Breaker Pattern

Every production agent needs a circuit breaker. LLMs can get stuck in loops when a tool keeps failing or the model keeps requesting tools when it should respond directly.

def route_after_agent(state: AgentState) -> str:
    # Hard limit on tool calls
    if state["tool_calls_attempted"] >= 10:
        return "error_handler"

    # Detect loop: same tool called 3x with same args
    recent_messages = state["messages"][-6:]
    tool_call_counts = {}
    for msg in recent_messages:
        if hasattr(msg, "tool_calls"):
            for tc in msg.tool_calls:
                key = f"{tc['name']}:{str(tc['args'])}"
                tool_call_counts[key] = tool_call_counts.get(key, 0) + 1
                if tool_call_counts[key] >= 3:
                    return "error_handler"

    if hasattr(state["messages"][-1], "tool_calls") and state["messages"][-1].tool_calls:
        return "tools"

    return "validate"

The error handler node should return a safe fallback response and log the loop pattern for debugging — never let an infinite loop burn tokens silently.

Memory and Persistence

LangGraph's checkpointer system handles multi-session memory. The framework saves complete graph state to a store keyed by thread_id. On the next request with the same thread_id, it restores the full state including message history.

For production, use PostgreSQL:

from langgraph.checkpoint.postgres import PostgresSaver
import psycopg

conn_string = "postgresql://user:password@localhost:5432/agent_db"

with PostgresSaver.from_conn_string(conn_string) as checkpointer:
    checkpointer.setup()  # Creates tables on first run
    agent = build_agent(checkpointer=checkpointer)

# Each user gets their own thread
config = {"configurable": {"thread_id": f"user_{user_id}_session_{session_id}"}}
result = agent.invoke({"messages": [HumanMessage(content=user_input)]}, config=config)

Memory management consideration: A thread accumulates messages indefinitely. At scale, long threads become expensive (more tokens per call) and slow. Implement a summarization node that condenses old messages every N turns:

def summarize_if_needed(state: AgentState) -> AgentState:
    if len(state["messages"]) < 20:
        return {}

    # Summarize messages older than the last 10
    old_messages = state["messages"][:-10]
    summary_response = llm.invoke([
        HumanMessage(content=f"Summarize this conversation concisely: {str(old_messages)}")
    ])

    return {
        "messages": [HumanMessage(content=f"[Conversation summary: {summary_response.content}]")] + state["messages"][-10:]
    }

The Evaluator-Optimizer Pattern for Regulated Domains

In healthcare, finance, and legal applications, a single LLM pass is insufficient — one-shot accuracy in these domains is approximately 74%. The Evaluator-Optimizer pattern adds a validation layer:

def compliance_evaluator(state: AgentState) -> AgentState:
    """
    Checks agent output against domain rules before returning.
    Returns output_validated=True if output passes, False to trigger retry.
    """
    last_output = state["messages"][-1].content

    # Domain-specific rules — customize for your use case
    validation_prompt = f"""
    Review this AI-generated response for compliance issues:

    Response: {last_output}

    Check for:
    1. Any specific medical advice (flag if present)
    2. Any definitive legal conclusions (flag if present)
    3. Missing required disclaimers
    4. Factual claims that require source citation

    Return JSON: {{"passes": true/false, "issues": ["list", "of", "issues"]}}
    """

    validator_response = llm.invoke([HumanMessage(content=validation_prompt)])

    try:
        result = json.loads(validator_response.content)
        if not result["passes"]:
            correction_prompt = f"Your previous response had these issues: {result['issues']}. Please revise it."
            return {
                "messages": [HumanMessage(content=correction_prompt)],
                "output_validated": False
            }
    except json.JSONDecodeError:
        pass  # Validator itself failed — accept output

    return {"output_validated": True}

This pattern raises accuracy to 97%+ in regulated domains at the cost of 2–3x more LLM calls. Worth it for HIPAA/SOX/FINRA-relevant outputs.

Observability: What to Log

A production agent with no observability is a black box. At minimum, instrument:

import logging
import time
from functools import wraps

def instrument_node(node_name: str):
    def decorator(func):
        @wraps(func)
        def wrapper(state: AgentState) -> AgentState:
            start = time.time()
            try:
                result = func(state)
                logging.info({
                    "event": "node_complete",
                    "node": node_name,
                    "duration_ms": int((time.time() - start) * 1000),
                    "tool_calls": state.get("tool_calls_attempted", 0),
                    "message_count": len(state.get("messages", [])),
                })
                return result
            except Exception as e:
                logging.error({
                    "event": "node_error",
                    "node": node_name,
                    "error": str(e),
                    "duration_ms": int((time.time() - start) * 1000),
                })
                raise
        return wrapper
    return decorator

@instrument_node("agent")
def agent_node(state: AgentState) -> AgentState:
    # ... node implementation

For LangSmith integration, set LANGCHAIN_TRACING_V2=true and LANGCHAIN_API_KEY in your environment — LangGraph will automatically send traces to LangSmith with zero additional code.

Deployment Architecture

For a production LangGraph service:

API layer: FastAPI with async endpoints (LangGraph supports ainvoke and astream)
Checkpointer: PostgreSQL for persistent memory, Redis for high-throughput stateless sessions
LLM provider: Azure OpenAI or AWS Bedrock for enterprise data residency requirements — see our LLM integration approach
Queue: Celery or AWS SQS for async task processing (long-running agents)
Observability: LangSmith for agent traces, Prometheus for infrastructure metrics

from fastapi import FastAPI
from langgraph.checkpoint.postgres.aio import AsyncPostgresSaver

app = FastAPI()

@app.post("/agent/invoke")
async def invoke_agent(request: AgentRequest):
    async with AsyncPostgresSaver.from_conn_string(DATABASE_URL) as checkpointer:
        agent = build_agent(checkpointer=checkpointer)
        config = {"configurable": {"thread_id": request.thread_id}}

        result = await agent.ainvoke(
            {"messages": [HumanMessage(content=request.message)]},
            config=config
        )

        return {"response": result["messages"][-1].content}

Common Failure Modes and How to Prevent Them

Failure	Root Cause	Fix
Infinite tool call loop	No circuit breaker	Add max_tool_calls counter + loop detection
Lost conversation history	Missing Annotated operator.add	Use Annotated[list, operator.add] on messages field
Silent tool failures	Exceptions crash graph	Wrap all tool calls in try/except, return error as ToolMessage
Token budget blowout	Unbounded message accumulation	Add summarization node every N turns
Inconsistent output	No output validation	Add evaluator node for regulated domains
Agent "forgets" context	No checkpointer	Use PostgreSQL or SQLite checkpointer in production

What Production Actually Costs

Based on Ortem Technologies' deployed AI agents for fintech and healthcare clients:

Development time: 6–12 weeks for a production-ready agent (not a prototype)
Infrastructure: $200–500/month for PostgreSQL + Redis + API hosting at moderate scale
LLM costs at 10K tasks/day: $50–200/day depending on model and average task complexity
Optimization potential: Switching GPT-4o → GPT-4o-mini for non-critical nodes typically cuts LLM costs 60–80%

The production work — state design, error handling, observability, testing — takes 3–4x longer than building the initial prototype. Plan for it.

Ortem Technologies builds production-grade AI agents for fintech, healthcare, and enterprise clients — including multi-agent orchestration systems and LangGraph deployments on Azure and AWS. Talk to our AI engineering team → | View AI case studies →

About Ortem Technologies

Ortem Technologies is a premier custom software, mobile app, and AI development company. We serve enterprise and startup clients across the USA, UK, Australia, Canada, and the Middle East. Our cross-industry expertise spans fintech, healthcare, and logistics, enabling us to deliver scalable, secure, and innovative digital solutions worldwide.

📬

Get the Ortem Tech Digest

Monthly insights on AI, mobile, and software strategy - straight to your inbox. No spam, ever.

LangGraphAI agent developmentLangChainagentic AI 2026multi-agent systemsLLM productionAI engineeringPython AI

Sources & References

1.LangGraph Documentation - LangChain
2.LangGraph Conceptual Guide - LangChain
3.LangSmith Observability - LangChain

About the Author

Praveen Jha

Director – AI Product Strategy, Development, Sales & Business Development, Ortem Technologies

Praveen Jha is the Director of AI Product Strategy, Development, Sales & Business Development at Ortem Technologies. With deep expertise in technology consulting and enterprise sales, he helps businesses identify the right digital transformation strategies - from mobile and AI solutions to cloud-native platforms. He writes about technology adoption, business growth, and building software partnerships that deliver real ROI.

Business DevelopmentTechnology ConsultingDigital Transformation

Frequently Asked Questions

: LangGraph is a library built on top of LangChain that adds stateful, graph-based orchestration for AI agents. LangChain provides the building blocks (LLM wrappers, tools, chains). LangGraph adds the control flow: cycles, conditional branches, parallel execution, and persistent state between steps. Use LangChain for simple sequential pipelines; use LangGraph when your agent needs to make decisions, loop, or maintain memory across multiple tool calls.
: Cost depends entirely on the LLM you use and call frequency. A GPT-4o-based agent handling 1,000 tasks/day with an average of 5 LLM calls per task at 2,000 input tokens + 500 output tokens costs approximately $15–25/day using GPT-4o pricing ($2.50/1M input, $10/1M output). Switching to GPT-4o-mini or Claude 3.5 Haiku for non-critical steps reduces this to $2–5/day. Always profile which nodes consume the most tokens before optimizing.
: The Evaluator-Optimizer pattern runs a secondary validation agent that checks the primary agent's output against domain rules before returning a response. Instead of trusting a single LLM pass (one-shot, ~74% accuracy in regulated domains), the evaluator checks the output and sends it back for revision if it fails validation. This pattern improves accuracy to 97%+ in regulated domains like finance and healthcare, at the cost of 2–3x more LLM calls.
: Yes, via checkpointers. LangGraph supports in-memory checkpointers (for single-session use), SQLite checkpointers (persistent, local), and Redis/PostgreSQL checkpointers (persistent, production). A PostgreSQL checkpointer stores the complete graph state — conversation history, tool call results, user preferences — and restores it at the start of each session using a thread_id. This is how multi-session agents remember user context.
: LangSmith (by LangChain) is the primary observability platform — it traces every node execution, LLM call, and tool invocation with latency and token counts. For self-hosted observability, Langfuse is an open-source alternative. For production alerting, instrument LangGraph with OpenTelemetry and send traces to Grafana or Datadog. At minimum, log: node name, execution duration, LLM model used, input/output token counts, and any errors.
: Production agent failures fall into three categories: LLM API errors (handle with retry logic + exponential backoff), tool call failures (add an error handling node that decides whether to retry, skip, or escalate to human review), and logic errors (incorrect LLM outputs that pass silently — catch with an evaluator node or output validation schema using Pydantic). Never let a tool call failure crash the graph; always route to an error handler node first.

Stay Ahead

Get engineering insights in your inbox

Practical guides on software development, AI, and cloud. No fluff — published when it's worth your time.

Ready to Start Your Project?

Let Ortem Technologies help you build innovative solutions for your business.

AI Engineering

GPT-5.5 vs Claude Opus 4.7 vs Gemini 3.1 Pro: Which AI Model Should You Build With in 2026?

13 min readMay 9, 2026

AI Engineering

Vibe Coding in 2026: What It Is, What It Costs You, and When to Use It

12 min readMay 9, 2026

AI Engineering

MCP (Model Context Protocol) in 2026: What It Is, Why It Hit 97M Downloads, and How to Use It

14 min readMay 10, 2026

How to Build a Production-Ready AI Agent with LangGraph in 2026

What LangGraph Actually Is

The Minimal Production-Ready Structure

State Management: The Most Common Mistake

The Circuit Breaker Pattern

Memory and Persistence

The Evaluator-Optimizer Pattern for Regulated Domains

Observability: What to Log

Deployment Architecture

Common Failure Modes and How to Prevent Them

What Production Actually Costs

About Ortem Technologies

Get the Ortem Tech Digest

Frequently Asked Questions

Get engineering insights in your inbox

Ready to Start Your Project?

You Might Also Like

GPT-5.5 vs Claude Opus 4.7 vs Gemini 3.1 Pro: Which AI Model Should You Build With in 2026?

Vibe Coding in 2026: What It Is, What It Costs You, and When to Use It

MCP (Model Context Protocol) in 2026: What It Is, Why It Hit 97M Downloads, and How to Use It