AI Agents Architecture — Core Concepts

ReAct, Chain-of-Thought planning, tool calling schemas, memory systems, multi-agent orchestration, and the key design patterns behind systems like AutoGPT and Claude's Computer Use.

What Makes an Agent Different From a Chatbot

A chatbot answers. An agent acts.

The difference is agency over tools and state. A chatbot’s response is its final output. An agent’s LLM generates intermediate thoughts that lead to tool calls, which produce observations, which inform further reasoning, until a final answer is reached.

This is the ReAct pattern (Yao et al., 2022): interleave Reasoning and Acting.

ReAct: The Foundation Pattern

ReAct structures the agent’s operation as a loop of Thought → Action → Observation:

Thought: I need to find the current price of AAPL stock.
Action: web_search("AAPL stock price today")
Observation: AAPL is trading at $185.20 as of market close.
Thought: Now I have the price. I'll calculate the user's portfolio value.
Action: calculate(shares=100, price=185.20)
Observation: 100 shares × $185.20 = $18,520
Thought: I have the answer.
Final Answer: Your 100 AAPL shares are worth $18,520.

The LLM generates both Thought and Action. The runtime executes the Action and provides Observation. The LLM then reads the Observation and generates the next Thought.

ReAct outperformed chain-of-thought prompting alone on tasks requiring factual retrieval (like HotpotQA) because it can access and incorporate fresh information rather than relying entirely on parametric knowledge.

Tool Calling: Function Schemas

Modern AI APIs (OpenAI function calling, Anthropic tool use) formalize the tool interface using JSON schemas:

{
  "name": "web_search",
  "description": "Search the web for current information",
  "parameters": {
    "type": "object",
    "properties": {
      "query": {
        "type": "string",
        "description": "The search query"
      },
      "n_results": {
        "type": "integer",
        "description": "Number of results to return",
        "default": 5
      }
    },
    "required": ["query"]
  }
}

The LLM generates JSON matching this schema when it wants to call the tool. The runtime validates the JSON, executes the function, and returns results.

This structured format has several advantages over free-text tool calls:

Easier to parse reliably
Enables input validation before execution
Documents the interface explicitly
Allows the LLM to understand available tools through their schemas

Memory Systems

Agents need different types of memory for different purposes:

In-context memory: The conversation history and accumulated observations within the current context window. Fast access but limited to window size (typically 8k–200k tokens). Everything within the current session.

External memory / Knowledge base: Vector database containing past interactions, facts, documents. Retrieved via semantic search. Unlimited storage; retrieval latency ~50–100ms. Used for: remembering facts from past sessions, accessing large document collections.

Episodic memory: Structured storage of past agent runs — what goal was pursued, what actions were taken, what succeeded. Used by agents to learn from experience and avoid repeating failures.

Tool state: External systems that maintain state — a calendar with appointments, a todo list, a code repository. The agent reads/writes these through its tools.

Planning Approaches

Single-shot planning: Generate the full plan upfront, then execute. Brittle — if early steps fail, the plan needs to be regenerated.

Chain-of-thought planning: Generate reasoning about each step inline. Flexible but requires the LLM to plan and execute within one context.

Tree of Thoughts (Yao et al., 2023): Explore multiple reasoning paths, evaluate each, backtrack and try alternatives when paths fail. Appropriate for combinatorial problems where there are many possible approaches.

LATS (Language Agent Tree Search): Combines Monte Carlo Tree Search (MCTS) with LLMs — the LLM generates candidate actions, MCTS evaluates them with a value function, selects the best action sequence. Outperforms ReAct on complex coding and reasoning tasks.

Multi-Agent Systems

Complex tasks benefit from specialization. Multi-agent systems assign different agents to different roles:

Orchestrator-worker pattern: A planning agent breaks a task into subtasks and routes them to specialized workers. Workers report back; orchestrator synthesizes results.

Debate / verification pattern: Multiple agents independently generate solutions, then argue about correctness. A judge agent decides. Reduces individual model errors.

Role-playing agents: Each agent is prompted with a specific persona (senior engineer, product manager, QA tester) and reviews the problem from that perspective. AutoGen (Microsoft) and CrewAI implement this.

Example: Software development team:

Agent 1 (Product Manager): Understand requirements, write spec
Agent 2 (Engineer): Implement feature
Agent 3 (Tester): Write and run tests
Agent 4 (Code Reviewer): Review code, suggest improvements
Orchestrator: Manages workflow, handles blockers

Claude’s Computer Use (2024): Anthropic’s demonstration of an agent controlling a computer — taking screenshots, clicking, typing, running programs. Shows that with sufficient tools (screenshot, click, type), an LLM can control any software with a visual interface.

Common Failure Modes

Infinite loops: Agent cycles through the same actions. Prevention: action deduplication, maximum step limits, loop detection.

Context overflow: Long agent runs fill the context window. Prevention: context compression (summarize older steps), persistent external memory.

Tool call errors: External tools fail or return unexpected results. Prevention: retry logic, graceful degradation, error handling in the agent prompt.

Hallucinated tool results: Agent “remembers” tool results that never happened. Prevention: always verify key facts with tool calls rather than relying on the LLM’s parametric knowledge.

One thing to remember: The power of AI agents comes from LLMs’ ability to reason about sequences of actions — the tool-calling interface provides the connection to the real world, but the quality of the agent depends entirely on the LLM’s reasoning about what to do, when, and why.

ai-agentsreacttool-callingplanningmemorymulti-agentautonomous-agents