AI Agents Architecture — Core Concepts
What Makes an Agent Different From a Chatbot
A chatbot answers. An agent acts.
The difference is agency over tools and state. A chatbot’s response is its final output. An agent’s LLM generates intermediate thoughts that lead to tool calls, which produce observations, which inform further reasoning, until a final answer is reached.
This is the ReAct pattern (Yao et al., 2022): interleave Reasoning and Acting.
ReAct: The Foundation Pattern
ReAct structures the agent’s operation as a loop of Thought → Action → Observation:
Thought: I need to find the current price of AAPL stock.
Action: web_search("AAPL stock price today")
Observation: AAPL is trading at $185.20 as of market close.
Thought: Now I have the price. I'll calculate the user's portfolio value.
Action: calculate(shares=100, price=185.20)
Observation: 100 shares × $185.20 = $18,520
Thought: I have the answer.
Final Answer: Your 100 AAPL shares are worth $18,520.
The LLM generates both Thought and Action. The runtime executes the Action and provides Observation. The LLM then reads the Observation and generates the next Thought.
ReAct outperformed chain-of-thought prompting alone on tasks requiring factual retrieval (like HotpotQA) because it can access and incorporate fresh information rather than relying entirely on parametric knowledge.
Tool Calling: Function Schemas
Modern AI APIs (OpenAI function calling, Anthropic tool use) formalize the tool interface using JSON schemas:
{
"name": "web_search",
"description": "Search the web for current information",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query"
},
"n_results": {
"type": "integer",
"description": "Number of results to return",
"default": 5
}
},
"required": ["query"]
}
}
The LLM generates JSON matching this schema when it wants to call the tool. The runtime validates the JSON, executes the function, and returns results.
This structured format has several advantages over free-text tool calls:
- Easier to parse reliably
- Enables input validation before execution
- Documents the interface explicitly
- Allows the LLM to understand available tools through their schemas
Memory Systems
Agents need different types of memory for different purposes:
In-context memory: The conversation history and accumulated observations within the current context window. Fast access but limited to window size (typically 8k–200k tokens). Everything within the current session.
External memory / Knowledge base: Vector database containing past interactions, facts, documents. Retrieved via semantic search. Unlimited storage; retrieval latency ~50–100ms. Used for: remembering facts from past sessions, accessing large document collections.
Episodic memory: Structured storage of past agent runs — what goal was pursued, what actions were taken, what succeeded. Used by agents to learn from experience and avoid repeating failures.
Tool state: External systems that maintain state — a calendar with appointments, a todo list, a code repository. The agent reads/writes these through its tools.
Planning Approaches
Single-shot planning: Generate the full plan upfront, then execute. Brittle — if early steps fail, the plan needs to be regenerated.
Chain-of-thought planning: Generate reasoning about each step inline. Flexible but requires the LLM to plan and execute within one context.
Tree of Thoughts (Yao et al., 2023): Explore multiple reasoning paths, evaluate each, backtrack and try alternatives when paths fail. Appropriate for combinatorial problems where there are many possible approaches.
LATS (Language Agent Tree Search): Combines Monte Carlo Tree Search (MCTS) with LLMs — the LLM generates candidate actions, MCTS evaluates them with a value function, selects the best action sequence. Outperforms ReAct on complex coding and reasoning tasks.
Multi-Agent Systems
Complex tasks benefit from specialization. Multi-agent systems assign different agents to different roles:
Orchestrator-worker pattern: A planning agent breaks a task into subtasks and routes them to specialized workers. Workers report back; orchestrator synthesizes results.
Debate / verification pattern: Multiple agents independently generate solutions, then argue about correctness. A judge agent decides. Reduces individual model errors.
Role-playing agents: Each agent is prompted with a specific persona (senior engineer, product manager, QA tester) and reviews the problem from that perspective. AutoGen (Microsoft) and CrewAI implement this.
Example: Software development team:
- Agent 1 (Product Manager): Understand requirements, write spec
- Agent 2 (Engineer): Implement feature
- Agent 3 (Tester): Write and run tests
- Agent 4 (Code Reviewer): Review code, suggest improvements
- Orchestrator: Manages workflow, handles blockers
Claude’s Computer Use (2024): Anthropic’s demonstration of an agent controlling a computer — taking screenshots, clicking, typing, running programs. Shows that with sufficient tools (screenshot, click, type), an LLM can control any software with a visual interface.
Common Failure Modes
Infinite loops: Agent cycles through the same actions. Prevention: action deduplication, maximum step limits, loop detection.
Context overflow: Long agent runs fill the context window. Prevention: context compression (summarize older steps), persistent external memory.
Tool call errors: External tools fail or return unexpected results. Prevention: retry logic, graceful degradation, error handling in the agent prompt.
Hallucinated tool results: Agent “remembers” tool results that never happened. Prevention: always verify key facts with tool calls rather than relying on the LLM’s parametric knowledge.
One thing to remember: The power of AI agents comes from LLMs’ ability to reason about sequences of actions — the tool-calling interface provides the connection to the real world, but the quality of the agent depends entirely on the LLM’s reasoning about what to do, when, and why.
See Also
- Activation Functions Why neural networks need these tiny mathematical functions — and how ReLU's simplicity accidentally made deep learning possible.
- Ai Agents ChatGPT answers questions. AI agents actually do things — browse the web, write code, send emails, and keep going until the job is done. Here's the difference.
- Ai Ethics Why building AI fairly is harder than it sounds — bias, accountability, privacy, and who gets to decide what AI is allowed to do.
- Ai Hallucinations ChatGPT sometimes makes up facts with total confidence. Here's the weird reason why — and why it's not as simple as 'the AI lied.'
- Ai Safety Why some of the world's smartest people are worried about AI — and what researchers are actually doing about it before it becomes a problem.