AI Agents
What Is an AI Agent
An AI agent is an LLM augmented with:
- Tools — ability to take actions (search, code execution, API calls)
- Memory — persistence across interactions
- Planning — breaking complex tasks into steps
- Reflection — evaluating and correcting its own outputs
The core loop:
Observe → Think → Act → Observe → Think → Act → ...
Unlike a simple LLM (stateless question → answer), an agent maintains state and takes actions to accomplish goals over multiple steps.
Why Agents Matter
Language models are powerful oracles — they answer questions. Agents are actors — they accomplish tasks.
The gap: A language model can tell you how to book a flight. An agent can actually book one:
- Search for flights
- Compare options
- Fill in forms
- Confirm booking
- Send confirmation email
This requires multiple tool calls, error handling, and state management — not just text generation.
The Agent Loop
Sequential Architecture
User Request
│
▼
┌─────────────────────────────────────────────────────────────┐
│ LLM (with tools) │
│ │
│ 1. Reason about current state │
│ 2. Decide: use a tool or respond to user? │
│ 3. If tool: call function, get result │
│ 4. Loop until task complete or max iterations │
└─────────────────────────────────────────────────────────────┘
│
▼
Final Response
Key Component: The Tool Interface
Modern agents use structured function calling (not raw text):
# OpenAI function calling example
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
tools=[{
"type": "function",
"function": {
"name": "get_weather",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string"},
"unit": {"enum": ["celsius", "fahrenheit"]}
},
"required": ["city"]
}
}
}]
)
# Model decides to call the function
if response.choices[0].finish_reason == "tool_calls":
tool_call = response.choices[0].message.tool_calls[0]
# Tool call: get_weather({"city": "Tokyo", "unit": "celsius"})Reasoning Patterns
ReAct (Reason + Act)
Yao et al. (2022) — interleaves reasoning and acting:
Thought: I need to find the population of Estonia.
Action: search("Estonia population 2024")
Observation: 1,326,855 (2024 estimate)
Thought: Now I have the answer.
Action: respond("Estonia has approximately 1.33 million people.")
The key insight: thinking out loud ( Thought) before acting creates a trace that the model can self-correct from.
Plan-and-Execute
Decompose the task upfront, then execute step by step:
Task: Plan a trip to Tokyo
Plan:
1. Search for flights from [origin] to Tokyo
2. Find accommodation in central Tokyo
3. Research visa requirements
4. Create itinerary with daily activities
Execute Step 1 → Execute Step 2 → ...
Tree of Thought (ToT)
Explores multiple reasoning branches:
Question: Solve this puzzle
Branch A: Try approach 1
→ A1: step 1 → A2: step 2 → A3: success/failure
Branch B: Try approach 2
→ B1: step 1 → B2: step 2 → B3: success/failure
Branch C: Try approach 3
→ ...
Evaluate: which branch produced the best result?
Uses search (BFS or DFS) over the reasoning tree. Good for complex planning, puzzles, strategic reasoning.
Reflexion
Shinn et al. (2023) — agents that reflect on failures:
Attempt 1: Write code → Execute → Error
Reflection: "The error was a null pointer. I forgot to check if the list was empty."
Attempt 2: Write better code with null check → Execute → Success
Reflection uses the failed trace as additional context for the next attempt.
Memory Systems
Short-Term Memory
The conversation history. Each message (user, assistant, tool result) is appended to context.
Limitation: context window size (128K tokens for GPT-4o, 200K for Claude 3.5).
Long-Term Memory
Persists across sessions. Implemented as a vector database:
Experience → Embed → Store in Vector DB
Query → Embed → Semantic search → Retrieve relevant memories
Key insight: not everything should be in context. Memories that are semantically similar to the current task are retrieved and injected into context.
Architecture Pattern
User Input
│
▼
┌──────────────┐
│ Embed + Store│ ← Current input stored for future
└──────────────┘
│
▼
┌──────────────┐
│ Retrieve │ ← Find relevant memories
│ (Vector DB) │
└──────────────┘
│
▼
┌──────────────┐
│ LLM │ ← Context: memories + current input
│ (Think) │
└──────────────┘
│
▼
┌──────────────┐
│ Take Action │ ← Tool calls or final response
└──────────────┘
Multi-Agent Systems
Multiple agents collaborating, each with specialized roles:
Manager Agent
│
├── Research Agent (web search, data retrieval)
├── Coder Agent (write/execute code)
├── Writer Agent (draft content)
└── Reviewer Agent (evaluate outputs)
CrewAI Pattern
from crewai import Agent, Task, Crew
researcher = Agent(
role="Researcher",
goal="Find relevant information",
backstory="Expert researcher with web access",
tools=[search, browse]
)
writer = Agent(
role="Writer",
goal="Write clear summaries",
backstory="Technical writer",
tools=[]
)
crew = Crew(agents=[researcher, writer], tasks=[...])
crew.kickoff()AutoGen Pattern
Microsoft’s conversational agent framework:
from autogen import ConversableAgent
assistant = ConversableAgent(
name="assistant",
system_message="You are a helpful assistant.",
llm_config={"model": "gpt-4o"}
)
user_proxy = ConversableAgent(
name="user_proxy",
is_coding_agent=True, # Can execute code
human_input_mode="NEVER"
)
chat = user_proxy.initiate_chat(
assistant,
message="Write and execute Python code for 2+2"
)Agent Protocols
MCP (Model Context Protocol)
Anthropic’s open standard (November 2024) for connecting agents to tools:
┌─────────────┐ MCP ┌─────────────┐
│ Agent │◄────────────►│ MCP Server │
│ (any LLM) │ JSON-RPC │ (tool impl) │
└─────────────┘ └─────────────┘
│ │
│ MCP Protocol │
└────────────────────────────┘
│
▼
┌──────────────┐
│ Resources │
│ Tools │
│ Prompts │
└──────────────┘
Key features:
- JSON-RPC 2.0 over stdio or HTTP
- Tool definitions as JSON schemas
- Resources (documents, files) as a separate concept
- Prompts (canned agent configurations)
Ecosystem: Donated to Linux Foundation’s Agent Alliance (Dec 2025). SDKs for Python, TypeScript, Java, Go, Rust.
A2A (Agent-to-Agent Protocol)
Google’s protocol (April 2025) for agent interoperability:
- MCP connects agents to tools/data (vertical)
- A2A connects agents to other agents (horizontal)
Agent A Agent B
│ │
│──── A2A: Task Request ───────►│
│◄─── A2A: Task Response ───────│
│ │
│──── A2A: Status Update ──────►│
Designed for enterprise workflows where different teams build different agents.
Frameworks
| Framework | Type | Best For |
|---|---|---|
| LangGraph | DAG-based state machine | Complex, branching workflows |
| OpenAI Agents SDK | Lightweight + guardrails | Production agents with safety |
| Google ADK | Python-first, A2A+MCP | Google ecosystem |
| Claude Agent SDK | MCP-native | Anthropic models |
| CrewAI | Role-based multi-agent | Team workflows |
| AutoGen | Conversational | Multi-agent chat |
| DSPy | Compilation | LLM pipelines as programs |
| Pydantic AI | Type-safe | Strong typing advocates |
Key Challenges
Reliability
Agents fail in compounding ways:
- Tool call with wrong parameters → error
- Error not handled → agent stuck
- Wrong conclusion from error → cascading mistakes
Mitigations:
- Guardrails (structured outputs, output validation)
- Retry logic with backoff
- Human-in-the-loop for critical actions
- Task decomposition (smaller steps = less error surface)
Cost
Each step in an agent loop = one LLM call. A 10-step task = 10 API calls.
Mitigations:
- Caching (repeated tool calls can be cached)
- Smaller models for simple steps
- Planning first, then execution (avoid redundant steps)
Safety
Agents with tool access can:
- Delete files, send emails, make purchases
- Expose sensitive data through tools
- Be manipulated through prompt injection
Mitigations:
- Sandboxed execution environments
- Permission models for tools
- Output validation before action
- Audit logging
Persistence and State
Managing agent state across sessions is non-trivial:
- What to remember? (not everything)
- When to update memory?
- How to handle conflicting memories?
Key Papers
- Yao et al. (2022) — ReAct — https://arxiv.org/abs/2210.03629
- Shinn et al. (2023) — Reflexion — https://arxiv.org/abs/2303.11366
- Wei et al. (2022) — Chain-of-Thought — https://arxiv.org/abs/2201.11903
- Long (2023) — Tree of Thought — https://arxiv.org/abs/2305.10601
Links
- Tool Use and Function Calling — How agents call functions
- Retrieval Augmented Generation — Long-term memory implementation
- Prompt Engineering — Effective prompting for agents
- RLHF and Alignment — Safety and alignment considerations
- Modern AI Techniques — Where agents fit in modern AI landscape