How to Build a Production-Ready AI Agent with LangGraph in 2026

LangGraph is a framework built on LangChain that enables stateful, multi-actor AI workflows using directed graphs. Unlike single-pass LLM calls, a LangGraph agent maintains state across steps, supports conditional branching, and can call tools, retry on failure, and loop until a stopping condition is met. A production-ready agent requires: typed state, error handling nodes, a human-in-the-loop checkpoint, and persistent memory via a checkpointer.
Most LangGraph tutorials get you to a working agent in 50 lines of code. Then you try to deploy it and realize you have no idea how to handle a failed tool call, where conversation memory lives, or why the agent loops infinitely when the LLM returns unexpected output.
This is the guide that covers what comes after the tutorial — the patterns, the failure modes, and the production architecture decisions.
What LangGraph Actually Is
LangGraph is a stateful graph execution framework for AI agents. It models an agent's decision-making process as a directed graph where nodes are functions (LLM calls, tool invocations, conditional logic) and edges are the transitions between them — including conditional edges that route based on the previous node's output.
The critical difference from LangChain chains: LangGraph supports cycles. An agent can call a tool, evaluate the result, decide to call a different tool, and loop back — until a stopping condition is met. This is what makes LangGraph suitable for real agentic behavior, not just pipeline automation.
Three core concepts:
- State: A typed Python class (TypedDict or Pydantic BaseModel) that flows through every node. Nodes receive state, modify it, and return the updated state.
- Nodes: Python functions that accept state and return state updates. Can be LLM calls, tool calls, conditional logic, or human checkpoints.
- Edges: Connections between nodes. Normal edges are unconditional. Conditional edges call a router function that returns the name of the next node based on current state.
The Minimal Production-Ready Structure
Most tutorials build a single-node agent. Production agents need at minimum:
- Typed state class — not a dict, a TypedDict with explicit fields
- Tool node — handles tool dispatch and error catching
- Error handling node — catches tool failures and decides: retry, skip, or escalate
- Evaluator node — validates LLM output before returning it (optional but critical for regulated domains)
- Checkpointer — persists state across sessions
- Timeout — prevents infinite loops from LLM indecision
Here is the skeleton:
from typing import TypedDict, Annotated, Literal
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.sqlite import SqliteSaver
from langchain_core.messages import BaseMessage, HumanMessage
from langchain_openai import ChatOpenAI
import operator
# 1. Typed state — explicit fields, not a generic dict
class AgentState(TypedDict):
messages: Annotated[list[BaseMessage], operator.add]
tool_calls_attempted: int
last_error: str | None
output_validated: bool
# 2. LLM with bound tools
llm = ChatOpenAI(model="gpt-4o", temperature=0)
llm_with_tools = llm.bind_tools(tools)
# 3. Core agent node
def agent_node(state: AgentState) -> AgentState:
response = llm_with_tools.invoke(state["messages"])
return {"messages": [response], "last_error": None}
# 4. Tool node with error catching
def tool_node(state: AgentState) -> AgentState:
last_message = state["messages"][-1]
tool_calls = last_message.tool_calls
results = []
for tool_call in tool_calls:
try:
tool = tool_registry[tool_call["name"]]
result = tool.invoke(tool_call["args"])
results.append(ToolMessage(content=str(result), tool_call_id=tool_call["id"]))
except Exception as e:
# Return error as tool result — don't crash the graph
results.append(ToolMessage(
content=f"ERROR: {str(e)}",
tool_call_id=tool_call["id"]
))
return {"messages": results, "last_error": str(e), "tool_calls_attempted": state["tool_calls_attempted"] + 1}
return {"messages": results, "tool_calls_attempted": state["tool_calls_attempted"] + 1}
# 5. Router — where does the agent go next?
def route_after_agent(state: AgentState) -> Literal["tools", "validate", "end"]:
last_message = state["messages"][-1]
# Circuit breaker: stop after 10 tool calls
if state["tool_calls_attempted"] >= 10:
return "end"
if hasattr(last_message, "tool_calls") and last_message.tool_calls:
return "tools"
return "validate"
# 6. Evaluator node — validates output before returning
def evaluator_node(state: AgentState) -> AgentState:
last_message = state["messages"][-1]
# Domain-specific validation — customize this
if not last_message.content or len(last_message.content) < 10:
# Output too short — likely a failure
return {
"messages": [HumanMessage(content="Your response was incomplete. Please provide a complete answer.")],
"output_validated": False
}
return {"output_validated": True}
def route_after_evaluate(state: AgentState) -> Literal["agent", "end"]:
if state.get("output_validated"):
return "end"
return "agent" # Loop back to regenerate
# 7. Build the graph
def build_agent(checkpointer=None) -> StateGraph:
workflow = StateGraph(AgentState)
workflow.add_node("agent", agent_node)
workflow.add_node("tools", tool_node)
workflow.add_node("validate", evaluator_node)
workflow.set_entry_point("agent")
workflow.add_conditional_edges("agent", route_after_agent, {
"tools": "tools",
"validate": "validate",
"end": END,
})
workflow.add_edge("tools", "agent")
workflow.add_conditional_edges("validate", route_after_evaluate, {
"agent": "agent",
"end": END,
})
return workflow.compile(checkpointer=checkpointer)
# 8. Persistent memory via SQLite checkpointer
with SqliteSaver.from_conn_string("agent_memory.db") as checkpointer:
agent = build_agent(checkpointer=checkpointer)
State Management: The Most Common Mistake
The single most common mistake in LangGraph agents we review: using a dict instead of a typed state class.
# Wrong — no type safety, no IDE support, silent failures
state = {"messages": [], "count": 0}
# Right — explicit types, Pydantic validation, clear contracts
class AgentState(TypedDict):
messages: Annotated[list[BaseMessage], operator.add]
tool_calls_attempted: int
last_error: str | None
retrieval_context: list[str]
output_validated: bool
The Annotated[list[BaseMessage], operator.add] pattern is LangGraph-specific: it tells the framework to merge (add) new messages into the list rather than overwrite the entire list. Without this, every node that returns a messages update would erase the conversation history.
Design principle: Every piece of information the agent needs to make routing decisions should be explicit state fields — not buried in message content that requires parsing.
The Circuit Breaker Pattern
Every production agent needs a circuit breaker. LLMs can get stuck in loops when a tool keeps failing or the model keeps requesting tools when it should respond directly.
def route_after_agent(state: AgentState) -> str:
# Hard limit on tool calls
if state["tool_calls_attempted"] >= 10:
return "error_handler"
# Detect loop: same tool called 3x with same args
recent_messages = state["messages"][-6:]
tool_call_counts = {}
for msg in recent_messages:
if hasattr(msg, "tool_calls"):
for tc in msg.tool_calls:
key = f"{tc['name']}:{str(tc['args'])}"
tool_call_counts[key] = tool_call_counts.get(key, 0) + 1
if tool_call_counts[key] >= 3:
return "error_handler"
if hasattr(state["messages"][-1], "tool_calls") and state["messages"][-1].tool_calls:
return "tools"
return "validate"
The error handler node should return a safe fallback response and log the loop pattern for debugging — never let an infinite loop burn tokens silently.
Memory and Persistence
LangGraph's checkpointer system handles multi-session memory. The framework saves complete graph state to a store keyed by thread_id. On the next request with the same thread_id, it restores the full state including message history.
For production, use PostgreSQL:
from langgraph.checkpoint.postgres import PostgresSaver
import psycopg
conn_string = "postgresql://user:password@localhost:5432/agent_db"
with PostgresSaver.from_conn_string(conn_string) as checkpointer:
checkpointer.setup() # Creates tables on first run
agent = build_agent(checkpointer=checkpointer)
# Each user gets their own thread
config = {"configurable": {"thread_id": f"user_{user_id}_session_{session_id}"}}
result = agent.invoke({"messages": [HumanMessage(content=user_input)]}, config=config)
Memory management consideration: A thread accumulates messages indefinitely. At scale, long threads become expensive (more tokens per call) and slow. Implement a summarization node that condenses old messages every N turns:
def summarize_if_needed(state: AgentState) -> AgentState:
if len(state["messages"]) < 20:
return {}
# Summarize messages older than the last 10
old_messages = state["messages"][:-10]
summary_response = llm.invoke([
HumanMessage(content=f"Summarize this conversation concisely: {str(old_messages)}")
])
return {
"messages": [HumanMessage(content=f"[Conversation summary: {summary_response.content}]")] + state["messages"][-10:]
}
The Evaluator-Optimizer Pattern for Regulated Domains
In healthcare, finance, and legal applications, a single LLM pass is insufficient — one-shot accuracy in these domains is approximately 74%. The Evaluator-Optimizer pattern adds a validation layer:
def compliance_evaluator(state: AgentState) -> AgentState:
"""
Checks agent output against domain rules before returning.
Returns output_validated=True if output passes, False to trigger retry.
"""
last_output = state["messages"][-1].content
# Domain-specific rules — customize for your use case
validation_prompt = f"""
Review this AI-generated response for compliance issues:
Response: {last_output}
Check for:
1. Any specific medical advice (flag if present)
2. Any definitive legal conclusions (flag if present)
3. Missing required disclaimers
4. Factual claims that require source citation
Return JSON: {{"passes": true/false, "issues": ["list", "of", "issues"]}}
"""
validator_response = llm.invoke([HumanMessage(content=validation_prompt)])
try:
result = json.loads(validator_response.content)
if not result["passes"]:
correction_prompt = f"Your previous response had these issues: {result['issues']}. Please revise it."
return {
"messages": [HumanMessage(content=correction_prompt)],
"output_validated": False
}
except json.JSONDecodeError:
pass # Validator itself failed — accept output
return {"output_validated": True}
This pattern raises accuracy to 97%+ in regulated domains at the cost of 2–3x more LLM calls. Worth it for HIPAA/SOX/FINRA-relevant outputs.
Observability: What to Log
A production agent with no observability is a black box. At minimum, instrument:
import logging
import time
from functools import wraps
def instrument_node(node_name: str):
def decorator(func):
@wraps(func)
def wrapper(state: AgentState) -> AgentState:
start = time.time()
try:
result = func(state)
logging.info({
"event": "node_complete",
"node": node_name,
"duration_ms": int((time.time() - start) * 1000),
"tool_calls": state.get("tool_calls_attempted", 0),
"message_count": len(state.get("messages", [])),
})
return result
except Exception as e:
logging.error({
"event": "node_error",
"node": node_name,
"error": str(e),
"duration_ms": int((time.time() - start) * 1000),
})
raise
return wrapper
return decorator
@instrument_node("agent")
def agent_node(state: AgentState) -> AgentState:
# ... node implementation
For LangSmith integration, set LANGCHAIN_TRACING_V2=true and LANGCHAIN_API_KEY in your environment — LangGraph will automatically send traces to LangSmith with zero additional code.
Deployment Architecture
For a production LangGraph service:
- API layer: FastAPI with async endpoints (LangGraph supports
ainvokeandastream) - Checkpointer: PostgreSQL for persistent memory, Redis for high-throughput stateless sessions
- LLM provider: Azure OpenAI or AWS Bedrock for enterprise data residency requirements — see our LLM integration approach
- Queue: Celery or AWS SQS for async task processing (long-running agents)
- Observability: LangSmith for agent traces, Prometheus for infrastructure metrics
from fastapi import FastAPI
from langgraph.checkpoint.postgres.aio import AsyncPostgresSaver
app = FastAPI()
@app.post("/agent/invoke")
async def invoke_agent(request: AgentRequest):
async with AsyncPostgresSaver.from_conn_string(DATABASE_URL) as checkpointer:
agent = build_agent(checkpointer=checkpointer)
config = {"configurable": {"thread_id": request.thread_id}}
result = await agent.ainvoke(
{"messages": [HumanMessage(content=request.message)]},
config=config
)
return {"response": result["messages"][-1].content}
Common Failure Modes and How to Prevent Them
| Failure | Root Cause | Fix |
|---|---|---|
| Infinite tool call loop | No circuit breaker | Add max_tool_calls counter + loop detection |
| Lost conversation history | Missing Annotated operator.add | Use Annotated[list, operator.add] on messages field |
| Silent tool failures | Exceptions crash graph | Wrap all tool calls in try/except, return error as ToolMessage |
| Token budget blowout | Unbounded message accumulation | Add summarization node every N turns |
| Inconsistent output | No output validation | Add evaluator node for regulated domains |
| Agent "forgets" context | No checkpointer | Use PostgreSQL or SQLite checkpointer in production |
What Production Actually Costs
Based on Ortem Technologies' deployed AI agents for fintech and healthcare clients:
- Development time: 6–12 weeks for a production-ready agent (not a prototype)
- Infrastructure: $200–500/month for PostgreSQL + Redis + API hosting at moderate scale
- LLM costs at 10K tasks/day: $50–200/day depending on model and average task complexity
- Optimization potential: Switching GPT-4o → GPT-4o-mini for non-critical nodes typically cuts LLM costs 60–80%
The production work — state design, error handling, observability, testing — takes 3–4x longer than building the initial prototype. Plan for it.
Ortem Technologies builds production-grade AI agents for fintech, healthcare, and enterprise clients — including multi-agent orchestration systems and LangGraph deployments on Azure and AWS. Talk to our AI engineering team → | View AI case studies →
About Ortem Technologies
Ortem Technologies is a premier custom software, mobile app, and AI development company. We serve enterprise and startup clients across the USA, UK, Australia, Canada, and the Middle East. Our cross-industry expertise spans fintech, healthcare, and logistics, enabling us to deliver scalable, secure, and innovative digital solutions worldwide.
Get the Ortem Tech Digest
Monthly insights on AI, mobile, and software strategy - straight to your inbox. No spam, ever.
Sources & References
- 1.LangGraph Documentation - LangChain
- 2.LangGraph Conceptual Guide - LangChain
- 3.LangSmith Observability - LangChain
About the Author
Director – AI Product Strategy, Development, Sales & Business Development, Ortem Technologies
Praveen Jha is the Director of AI Product Strategy, Development, Sales & Business Development at Ortem Technologies. With deep expertise in technology consulting and enterprise sales, he helps businesses identify the right digital transformation strategies - from mobile and AI solutions to cloud-native platforms. He writes about technology adoption, business growth, and building software partnerships that deliver real ROI.
Frequently Asked Questions
- LangGraph is a library built on top of LangChain that adds stateful, graph-based orchestration for AI agents. LangChain provides the building blocks (LLM wrappers, tools, chains). LangGraph adds the control flow: cycles, conditional branches, parallel execution, and persistent state between steps. Use LangChain for simple sequential pipelines; use LangGraph when your agent needs to make decisions, loop, or maintain memory across multiple tool calls.
- Cost depends entirely on the LLM you use and call frequency. A GPT-4o-based agent handling 1,000 tasks/day with an average of 5 LLM calls per task at 2,000 input tokens + 500 output tokens costs approximately $15–25/day using GPT-4o pricing ($2.50/1M input, $10/1M output). Switching to GPT-4o-mini or Claude 3.5 Haiku for non-critical steps reduces this to $2–5/day. Always profile which nodes consume the most tokens before optimizing.
- The Evaluator-Optimizer pattern runs a secondary validation agent that checks the primary agent's output against domain rules before returning a response. Instead of trusting a single LLM pass (one-shot, ~74% accuracy in regulated domains), the evaluator checks the output and sends it back for revision if it fails validation. This pattern improves accuracy to 97%+ in regulated domains like finance and healthcare, at the cost of 2–3x more LLM calls.
- Yes, via checkpointers. LangGraph supports in-memory checkpointers (for single-session use), SQLite checkpointers (persistent, local), and Redis/PostgreSQL checkpointers (persistent, production). A PostgreSQL checkpointer stores the complete graph state — conversation history, tool call results, user preferences — and restores it at the start of each session using a thread_id. This is how multi-session agents remember user context.
- LangSmith (by LangChain) is the primary observability platform — it traces every node execution, LLM call, and tool invocation with latency and token counts. For self-hosted observability, Langfuse is an open-source alternative. For production alerting, instrument LangGraph with OpenTelemetry and send traces to Grafana or Datadog. At minimum, log: node name, execution duration, LLM model used, input/output token counts, and any errors.
- Production agent failures fall into three categories: LLM API errors (handle with retry logic + exponential backoff), tool call failures (add an error handling node that decides whether to retry, skip, or escalate to human review), and logic errors (incorrect LLM outputs that pass silently — catch with an evaluator node or output validation schema using Pydantic). Never let a tool call failure crash the graph; always route to an error handler node first.
Stay Ahead
Get engineering insights in your inbox
Practical guides on software development, AI, and cloud. No fluff — published when it's worth your time.
Ready to Start Your Project?
Let Ortem Technologies help you build innovative solutions for your business.
You Might Also Like
GPT-5.5 vs Claude Opus 4.7 vs Gemini 3.1 Pro: Which AI Model Should You Build With in 2026?
Vibe Coding in 2026: What It Is, What It Costs You, and When to Use It

