Agent Frameworks in Python — Deep Dive

Build and compare production AI agents in Python using LangGraph state machines, CrewAI multi-agent systems, custom ReAct loops, tool governance, and reliability patterns for real-world deployment.

Agent frameworks in Python abstract the loop of reasoning, tool use, and observation. But production agents need more than a loop — they need reliability, observability, cost control, and graceful failure handling. This guide covers how to build agents that work in the real world.

1) Building a custom ReAct agent

Before using a framework, understand the core pattern:

from openai import OpenAI
import json

client = OpenAI()

def react_agent(query: str, tools: dict, system_prompt: str, max_steps: int = 8) -> str:
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": query},
    ]
    tool_defs = [
        {"type": "function", "function": {"name": name, **spec["schema"]}}
        for name, spec in tools.items()
    ]

    for step in range(max_steps):
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=tool_defs if tool_defs else None,
        )
        msg = response.choices[0].message
        messages.append(msg.model_dump())

        if not msg.tool_calls:
            return msg.content  # Final answer

        for tc in msg.tool_calls:
            tool = tools.get(tc.function.name)
            if not tool:
                result = f"Error: unknown tool {tc.function.name}"
            else:
                try:
                    args = json.loads(tc.function.arguments)
                    result = str(tool["fn"](**args))
                except Exception as e:
                    result = f"Error: {e}"
            messages.append({
                "role": "tool",
                "tool_call_id": tc.id,
                "content": result,
            })

    return "Agent reached maximum steps without completing the task."

This 40-line agent handles the core loop. Everything frameworks add (memory, state, multi-agent) builds on this foundation.

2) LangGraph: agents as state machines

LangGraph models agents as directed graphs where nodes are functions and edges are conditional transitions:

from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated
from operator import add

class AgentState(TypedDict):
    messages: Annotated[list, add]
    current_step: str
    research_results: list[str]
    draft: str

def research_node(state: AgentState) -> dict:
    """Gather information using search tools."""
    query = state["messages"][-1]["content"]
    results = search_tool(query)
    return {"research_results": results, "current_step": "draft"}

def draft_node(state: AgentState) -> dict:
    """Write a draft based on research."""
    context = "\n".join(state["research_results"])
    draft = generate_draft(context, state["messages"][-1]["content"])
    return {"draft": draft, "current_step": "review"}

def review_node(state: AgentState) -> dict:
    """Review the draft and decide if it needs more research."""
    quality = evaluate_quality(state["draft"])
    if quality < 0.7:
        return {"current_step": "research"}  # Loop back
    return {"current_step": "done"}

def should_continue(state: AgentState) -> str:
    if state["current_step"] == "done":
        return END
    return state["current_step"]

graph = StateGraph(AgentState)
graph.add_node("research", research_node)
graph.add_node("draft", draft_node)
graph.add_node("review", review_node)
graph.add_edge("research", "draft")
graph.add_edge("draft", "review")
graph.add_conditional_edges("review", should_continue)
graph.set_entry_point("research")

agent = graph.compile()

LangGraph’s explicit state management makes complex workflows debuggable. You can checkpoint state, replay from any node, and add human-in-the-loop approvals at specific edges.

3) Multi-agent patterns with CrewAI

When a task needs different skills, multiple agents can collaborate:

from crewai import Agent, Task, Crew

researcher = Agent(
    role="Research Analyst",
    goal="Find accurate, recent data on the topic",
    backstory="Senior analyst at a research firm with 10 years of experience",
    tools=[search_tool, web_scraper],
    llm="gpt-4o",
)

writer = Agent(
    role="Technical Writer",
    goal="Write clear, accurate content based on research",
    backstory="Technical writer who specializes in making complex topics accessible",
    tools=[],
    llm="gpt-4o",
)

research_task = Task(
    description="Research {topic} and compile key findings with sources",
    agent=researcher,
    expected_output="Bullet-pointed research findings with URLs",
)

writing_task = Task(
    description="Write a 500-word article based on the research findings",
    agent=writer,
    expected_output="Published-quality article with introduction, body, and conclusion",
    context=[research_task],
)

crew = Crew(agents=[researcher, writer], tasks=[research_task, writing_task])
result = crew.kickoff(inputs={"topic": "quantum computing advances in 2026"})

Multi-agent systems shine when tasks naturally decompose into roles. They struggle when roles overlap or when agents need tight coordination.

4) Tool governance and safety

Production agents need guardrails on tool access:

from dataclasses import dataclass
from enum import Enum

class ToolPermission(Enum):
    READ = "read"
    WRITE = "write"
    EXECUTE = "execute"

@dataclass
class ToolPolicy:
    name: str
    permissions: set[ToolPermission]
    rate_limit: int  # max calls per minute
    requires_approval: bool = False
    allowed_arguments: dict | None = None  # allowlist

class GovernedToolRegistry:
    def __init__(self):
        self.tools: dict[str, dict] = {}
        self.policies: dict[str, ToolPolicy] = {}
        self.call_log: list[dict] = []

    def register(self, name: str, fn, schema: dict, policy: ToolPolicy):
        self.tools[name] = {"fn": fn, "schema": schema}
        self.policies[name] = policy

    def execute(self, name: str, arguments: dict, context: dict) -> str:
        policy = self.policies.get(name)
        if not policy:
            return "Error: tool not registered"

        # Rate limit check
        recent_calls = sum(
            1 for log in self.call_log
            if log["tool"] == name and time.time() - log["time"] < 60
        )
        if recent_calls >= policy.rate_limit:
            return f"Rate limit exceeded for {name}"

        # Approval check
        if policy.requires_approval:
            return f"APPROVAL_REQUIRED: {name}({arguments})"

        # Execute
        result = self.tools[name]["fn"](**arguments)
        self.call_log.append({"tool": name, "args": arguments, "time": time.time()})
        return str(result)

Key governance rules:

Read operations are generally safe. Write operations need higher scrutiny.
Financial transactions and data deletions should require human approval.
Rate-limit all external API calls to prevent runaway costs.
Log every tool call for audit trails.

5) Memory architecture

Production agents need layered memory:

class AgentMemory:
    def __init__(self, vector_store, kv_store):
        self.conversation_history: list[dict] = []  # short-term
        self.vector_store = vector_store  # semantic long-term
        self.kv_store = kv_store  # exact long-term

    def add_message(self, role: str, content: str):
        self.conversation_history.append({"role": role, "content": content})
        # Also store in long-term for future sessions
        self.vector_store.add(content, metadata={"role": role, "time": time.time()})

    def recall(self, query: str, k: int = 5) -> list[str]:
        """Semantic recall from long-term memory."""
        return self.vector_store.search(query, top_k=k)

    def get_fact(self, key: str) -> str | None:
        """Exact recall of stored facts."""
        return self.kv_store.get(key)

    def store_fact(self, key: str, value: str):
        """Store a specific fact for exact recall."""
        self.kv_store.set(key, value)

    def get_context_window(self, max_tokens: int = 4000) -> list[dict]:
        """Return recent history that fits in the context window."""
        result = []
        token_count = 0
        for msg in reversed(self.conversation_history):
            msg_tokens = len(msg["content"]) // 4  # rough estimate
            if token_count + msg_tokens > max_tokens:
                break
            result.insert(0, msg)
            token_count += msg_tokens
        return result

6) Reliability patterns

Agents fail in production. Build resilience:

Timeout budgets — allocate a total time budget and per-step limits. If research takes too long, skip to drafting with available data.

Fallback chains — if the primary model fails, fall back to a simpler model or a pre-computed response.

Checkpointing — save agent state after each step. On failure, resume from the last checkpoint instead of starting over.

Dead letter queues — when an agent fails after all retries, save the task to a queue for human review rather than losing it.

7) Cost management

Agents are expensive. A complex task might make 10-20 LLM calls with tool results in context. Control costs by:

Setting maximum step limits per task.
Using cheaper models for planning and routing, expensive models only for final generation.
Caching tool results within a session.
Monitoring per-task cost and alerting on outliers.
Implementing budget caps per user or per task type.

8) When not to use agent frameworks

Agent frameworks add complexity. Avoid them when:

A single LLM call solves the problem reliably.
The task has a fixed, known sequence of steps (use prompt chaining instead).
Latency requirements are under 2 seconds (agents typically take 10-60 seconds).
The cost of errors is very high and you need deterministic behavior.

Start with the simplest approach that works. Add agent capabilities incrementally as you identify specific tasks that benefit from dynamic tool selection and planning.

The one thing to remember: Agent frameworks turn LLMs into autonomous problem solvers with tools and memory — but production agents need governance, reliability patterns, cost controls, and the discipline to use simpler approaches when agents are overkill.

pythonai-agentsllm-appslanggraphproduction