Ortem Technologies
    AI Engineering

    How to Build a Production-Ready AI Agent with LangGraph in 2026

    Praveen JhaMay 15, 202616 min read
    How to Build a Production-Ready AI Agent with LangGraph in 2026
    Quick Answer

    LangGraph is a framework built on LangChain that enables stateful, multi-actor AI workflows using directed graphs. Unlike single-pass LLM calls, a LangGraph agent maintains state across steps, supports conditional branching, and can call tools, retry on failure, and loop until a stopping condition is met. A production-ready agent requires: typed state, error handling nodes, a human-in-the-loop checkpoint, and persistent memory via a checkpointer.

    Most LangGraph tutorials get you to a working agent in 50 lines of code. Then you try to deploy it and realize you have no idea how to handle a failed tool call, where conversation memory lives, or why the agent loops infinitely when the LLM returns unexpected output.

    This is the guide that covers what comes after the tutorial — the patterns, the failure modes, and the production architecture decisions.

    What LangGraph Actually Is

    LangGraph is a stateful graph execution framework for AI agents. It models an agent's decision-making process as a directed graph where nodes are functions (LLM calls, tool invocations, conditional logic) and edges are the transitions between them — including conditional edges that route based on the previous node's output.

    The critical difference from LangChain chains: LangGraph supports cycles. An agent can call a tool, evaluate the result, decide to call a different tool, and loop back — until a stopping condition is met. This is what makes LangGraph suitable for real agentic behavior, not just pipeline automation.

    Three core concepts:

    • State: A typed Python class (TypedDict or Pydantic BaseModel) that flows through every node. Nodes receive state, modify it, and return the updated state.
    • Nodes: Python functions that accept state and return state updates. Can be LLM calls, tool calls, conditional logic, or human checkpoints.
    • Edges: Connections between nodes. Normal edges are unconditional. Conditional edges call a router function that returns the name of the next node based on current state.

    The Minimal Production-Ready Structure

    Most tutorials build a single-node agent. Production agents need at minimum:

    1. Typed state class — not a dict, a TypedDict with explicit fields
    2. Tool node — handles tool dispatch and error catching
    3. Error handling node — catches tool failures and decides: retry, skip, or escalate
    4. Evaluator node — validates LLM output before returning it (optional but critical for regulated domains)
    5. Checkpointer — persists state across sessions
    6. Timeout — prevents infinite loops from LLM indecision

    Here is the skeleton:

    from typing import TypedDict, Annotated, Literal
    from langgraph.graph import StateGraph, END
    from langgraph.checkpoint.sqlite import SqliteSaver
    from langchain_core.messages import BaseMessage, HumanMessage
    from langchain_openai import ChatOpenAI
    import operator
    
    # 1. Typed state — explicit fields, not a generic dict
    class AgentState(TypedDict):
        messages: Annotated[list[BaseMessage], operator.add]
        tool_calls_attempted: int
        last_error: str | None
        output_validated: bool
    
    # 2. LLM with bound tools
    llm = ChatOpenAI(model="gpt-4o", temperature=0)
    llm_with_tools = llm.bind_tools(tools)
    
    # 3. Core agent node
    def agent_node(state: AgentState) -> AgentState:
        response = llm_with_tools.invoke(state["messages"])
        return {"messages": [response], "last_error": None}
    
    # 4. Tool node with error catching
    def tool_node(state: AgentState) -> AgentState:
        last_message = state["messages"][-1]
        tool_calls = last_message.tool_calls
        results = []
    
        for tool_call in tool_calls:
            try:
                tool = tool_registry[tool_call["name"]]
                result = tool.invoke(tool_call["args"])
                results.append(ToolMessage(content=str(result), tool_call_id=tool_call["id"]))
            except Exception as e:
                # Return error as tool result — don't crash the graph
                results.append(ToolMessage(
                    content=f"ERROR: {str(e)}",
                    tool_call_id=tool_call["id"]
                ))
                return {"messages": results, "last_error": str(e), "tool_calls_attempted": state["tool_calls_attempted"] + 1}
    
        return {"messages": results, "tool_calls_attempted": state["tool_calls_attempted"] + 1}
    
    # 5. Router — where does the agent go next?
    def route_after_agent(state: AgentState) -> Literal["tools", "validate", "end"]:
        last_message = state["messages"][-1]
    
        # Circuit breaker: stop after 10 tool calls
        if state["tool_calls_attempted"] >= 10:
            return "end"
    
        if hasattr(last_message, "tool_calls") and last_message.tool_calls:
            return "tools"
    
        return "validate"
    
    # 6. Evaluator node — validates output before returning
    def evaluator_node(state: AgentState) -> AgentState:
        last_message = state["messages"][-1]
    
        # Domain-specific validation — customize this
        if not last_message.content or len(last_message.content) < 10:
            # Output too short — likely a failure
            return {
                "messages": [HumanMessage(content="Your response was incomplete. Please provide a complete answer.")],
                "output_validated": False
            }
    
        return {"output_validated": True}
    
    def route_after_evaluate(state: AgentState) -> Literal["agent", "end"]:
        if state.get("output_validated"):
            return "end"
        return "agent"  # Loop back to regenerate
    
    # 7. Build the graph
    def build_agent(checkpointer=None) -> StateGraph:
        workflow = StateGraph(AgentState)
    
        workflow.add_node("agent", agent_node)
        workflow.add_node("tools", tool_node)
        workflow.add_node("validate", evaluator_node)
    
        workflow.set_entry_point("agent")
    
        workflow.add_conditional_edges("agent", route_after_agent, {
            "tools": "tools",
            "validate": "validate",
            "end": END,
        })
        workflow.add_edge("tools", "agent")
        workflow.add_conditional_edges("validate", route_after_evaluate, {
            "agent": "agent",
            "end": END,
        })
    
        return workflow.compile(checkpointer=checkpointer)
    
    # 8. Persistent memory via SQLite checkpointer
    with SqliteSaver.from_conn_string("agent_memory.db") as checkpointer:
        agent = build_agent(checkpointer=checkpointer)
    

    State Management: The Most Common Mistake

    The single most common mistake in LangGraph agents we review: using a dict instead of a typed state class.

    # Wrong — no type safety, no IDE support, silent failures
    state = {"messages": [], "count": 0}
    
    # Right — explicit types, Pydantic validation, clear contracts
    class AgentState(TypedDict):
        messages: Annotated[list[BaseMessage], operator.add]
        tool_calls_attempted: int
        last_error: str | None
        retrieval_context: list[str]
        output_validated: bool
    

    The Annotated[list[BaseMessage], operator.add] pattern is LangGraph-specific: it tells the framework to merge (add) new messages into the list rather than overwrite the entire list. Without this, every node that returns a messages update would erase the conversation history.

    Design principle: Every piece of information the agent needs to make routing decisions should be explicit state fields — not buried in message content that requires parsing.

    The Circuit Breaker Pattern

    Every production agent needs a circuit breaker. LLMs can get stuck in loops when a tool keeps failing or the model keeps requesting tools when it should respond directly.

    def route_after_agent(state: AgentState) -> str:
        # Hard limit on tool calls
        if state["tool_calls_attempted"] >= 10:
            return "error_handler"
    
        # Detect loop: same tool called 3x with same args
        recent_messages = state["messages"][-6:]
        tool_call_counts = {}
        for msg in recent_messages:
            if hasattr(msg, "tool_calls"):
                for tc in msg.tool_calls:
                    key = f"{tc['name']}:{str(tc['args'])}"
                    tool_call_counts[key] = tool_call_counts.get(key, 0) + 1
                    if tool_call_counts[key] >= 3:
                        return "error_handler"
    
        if hasattr(state["messages"][-1], "tool_calls") and state["messages"][-1].tool_calls:
            return "tools"
    
        return "validate"
    

    The error handler node should return a safe fallback response and log the loop pattern for debugging — never let an infinite loop burn tokens silently.

    Memory and Persistence

    LangGraph's checkpointer system handles multi-session memory. The framework saves complete graph state to a store keyed by thread_id. On the next request with the same thread_id, it restores the full state including message history.

    For production, use PostgreSQL:

    from langgraph.checkpoint.postgres import PostgresSaver
    import psycopg
    
    conn_string = "postgresql://user:password@localhost:5432/agent_db"
    
    with PostgresSaver.from_conn_string(conn_string) as checkpointer:
        checkpointer.setup()  # Creates tables on first run
        agent = build_agent(checkpointer=checkpointer)
    
    # Each user gets their own thread
    config = {"configurable": {"thread_id": f"user_{user_id}_session_{session_id}"}}
    result = agent.invoke({"messages": [HumanMessage(content=user_input)]}, config=config)
    

    Memory management consideration: A thread accumulates messages indefinitely. At scale, long threads become expensive (more tokens per call) and slow. Implement a summarization node that condenses old messages every N turns:

    def summarize_if_needed(state: AgentState) -> AgentState:
        if len(state["messages"]) < 20:
            return {}
    
        # Summarize messages older than the last 10
        old_messages = state["messages"][:-10]
        summary_response = llm.invoke([
            HumanMessage(content=f"Summarize this conversation concisely: {str(old_messages)}")
        ])
    
        return {
            "messages": [HumanMessage(content=f"[Conversation summary: {summary_response.content}]")] + state["messages"][-10:]
        }
    

    The Evaluator-Optimizer Pattern for Regulated Domains

    In healthcare, finance, and legal applications, a single LLM pass is insufficient — one-shot accuracy in these domains is approximately 74%. The Evaluator-Optimizer pattern adds a validation layer:

    def compliance_evaluator(state: AgentState) -> AgentState:
        """
        Checks agent output against domain rules before returning.
        Returns output_validated=True if output passes, False to trigger retry.
        """
        last_output = state["messages"][-1].content
    
        # Domain-specific rules — customize for your use case
        validation_prompt = f"""
        Review this AI-generated response for compliance issues:
    
        Response: {last_output}
    
        Check for:
        1. Any specific medical advice (flag if present)
        2. Any definitive legal conclusions (flag if present)
        3. Missing required disclaimers
        4. Factual claims that require source citation
    
        Return JSON: {{"passes": true/false, "issues": ["list", "of", "issues"]}}
        """
    
        validator_response = llm.invoke([HumanMessage(content=validation_prompt)])
    
        try:
            result = json.loads(validator_response.content)
            if not result["passes"]:
                correction_prompt = f"Your previous response had these issues: {result['issues']}. Please revise it."
                return {
                    "messages": [HumanMessage(content=correction_prompt)],
                    "output_validated": False
                }
        except json.JSONDecodeError:
            pass  # Validator itself failed — accept output
    
        return {"output_validated": True}
    

    This pattern raises accuracy to 97%+ in regulated domains at the cost of 2–3x more LLM calls. Worth it for HIPAA/SOX/FINRA-relevant outputs.

    Observability: What to Log

    A production agent with no observability is a black box. At minimum, instrument:

    import logging
    import time
    from functools import wraps
    
    def instrument_node(node_name: str):
        def decorator(func):
            @wraps(func)
            def wrapper(state: AgentState) -> AgentState:
                start = time.time()
                try:
                    result = func(state)
                    logging.info({
                        "event": "node_complete",
                        "node": node_name,
                        "duration_ms": int((time.time() - start) * 1000),
                        "tool_calls": state.get("tool_calls_attempted", 0),
                        "message_count": len(state.get("messages", [])),
                    })
                    return result
                except Exception as e:
                    logging.error({
                        "event": "node_error",
                        "node": node_name,
                        "error": str(e),
                        "duration_ms": int((time.time() - start) * 1000),
                    })
                    raise
            return wrapper
        return decorator
    
    @instrument_node("agent")
    def agent_node(state: AgentState) -> AgentState:
        # ... node implementation
    

    For LangSmith integration, set LANGCHAIN_TRACING_V2=true and LANGCHAIN_API_KEY in your environment — LangGraph will automatically send traces to LangSmith with zero additional code.

    Deployment Architecture

    For a production LangGraph service:

    • API layer: FastAPI with async endpoints (LangGraph supports ainvoke and astream)
    • Checkpointer: PostgreSQL for persistent memory, Redis for high-throughput stateless sessions
    • LLM provider: Azure OpenAI or AWS Bedrock for enterprise data residency requirements — see our LLM integration approach
    • Queue: Celery or AWS SQS for async task processing (long-running agents)
    • Observability: LangSmith for agent traces, Prometheus for infrastructure metrics
    from fastapi import FastAPI
    from langgraph.checkpoint.postgres.aio import AsyncPostgresSaver
    
    app = FastAPI()
    
    @app.post("/agent/invoke")
    async def invoke_agent(request: AgentRequest):
        async with AsyncPostgresSaver.from_conn_string(DATABASE_URL) as checkpointer:
            agent = build_agent(checkpointer=checkpointer)
            config = {"configurable": {"thread_id": request.thread_id}}
    
            result = await agent.ainvoke(
                {"messages": [HumanMessage(content=request.message)]},
                config=config
            )
    
            return {"response": result["messages"][-1].content}
    

    Common Failure Modes and How to Prevent Them

    FailureRoot CauseFix
    Infinite tool call loopNo circuit breakerAdd max_tool_calls counter + loop detection
    Lost conversation historyMissing Annotated operator.addUse Annotated[list, operator.add] on messages field
    Silent tool failuresExceptions crash graphWrap all tool calls in try/except, return error as ToolMessage
    Token budget blowoutUnbounded message accumulationAdd summarization node every N turns
    Inconsistent outputNo output validationAdd evaluator node for regulated domains
    Agent "forgets" contextNo checkpointerUse PostgreSQL or SQLite checkpointer in production

    What Production Actually Costs

    Based on Ortem Technologies' deployed AI agents for fintech and healthcare clients:

    • Development time: 6–12 weeks for a production-ready agent (not a prototype)
    • Infrastructure: $200–500/month for PostgreSQL + Redis + API hosting at moderate scale
    • LLM costs at 10K tasks/day: $50–200/day depending on model and average task complexity
    • Optimization potential: Switching GPT-4o → GPT-4o-mini for non-critical nodes typically cuts LLM costs 60–80%

    The production work — state design, error handling, observability, testing — takes 3–4x longer than building the initial prototype. Plan for it.


    Ortem Technologies builds production-grade AI agents for fintech, healthcare, and enterprise clients — including multi-agent orchestration systems and LangGraph deployments on Azure and AWS. Talk to our AI engineering team → | View AI case studies →

    About Ortem Technologies

    Ortem Technologies is a premier custom software, mobile app, and AI development company. We serve enterprise and startup clients across the USA, UK, Australia, Canada, and the Middle East. Our cross-industry expertise spans fintech, healthcare, and logistics, enabling us to deliver scalable, secure, and innovative digital solutions worldwide.

    📬

    Get the Ortem Tech Digest

    Monthly insights on AI, mobile, and software strategy - straight to your inbox. No spam, ever.

    LangGraphAI agent developmentLangChainagentic AI 2026multi-agent systemsLLM productionAI engineeringPython AI

    Sources & References

    1. 1.LangGraph Documentation - LangChain
    2. 2.LangGraph Conceptual Guide - LangChain
    3. 3.LangSmith Observability - LangChain

    About the Author

    P
    Praveen Jha

    Director – AI Product Strategy, Development, Sales & Business Development, Ortem Technologies

    Praveen Jha is the Director of AI Product Strategy, Development, Sales & Business Development at Ortem Technologies. With deep expertise in technology consulting and enterprise sales, he helps businesses identify the right digital transformation strategies - from mobile and AI solutions to cloud-native platforms. He writes about technology adoption, business growth, and building software partnerships that deliver real ROI.

    Business DevelopmentTechnology ConsultingDigital Transformation
    LinkedIn

    Frequently Asked Questions

    Stay Ahead

    Get engineering insights in your inbox

    Practical guides on software development, AI, and cloud. No fluff — published when it's worth your time.

    Ready to Start Your Project?

    Let Ortem Technologies help you build innovative solutions for your business.