How to Build an AI Agent with Claude: Complete Guide (2026)
Claude isn't just a chatbot. With the right architecture, it becomes an autonomous agent that researches, decides, and executes — while you sleep. This guide shows you exactly how to build one, from first API call to production deployment.
We run multiple Claude-powered agents in production 24/7. One manages our content pipeline. Another monitors databases and fixes issues autonomously. A third handles research and analysis. This guide is based on what actually works, not what sounds impressive in a demo.
Why Claude for AI Agents?
Not all LLMs are equal when it comes to agentic behavior. Claude has specific advantages:
- 200K context window — fits entire codebases, long documents, and extensive conversation history without summarization hacks
- Native tool use — Claude's tool calling is structured, reliable, and supports parallel tool execution
- Extended thinking — Claude can reason through complex multi-step problems before acting
- System prompt adherence — Claude follows system prompt instructions more consistently than most models, critical for agent personality and boundaries
- Safety-first architecture — built-in refusal for harmful actions means fewer guardrails you need to build yourself
- MCP support — Model Context Protocol lets Claude connect to any data source or tool server
Architecture: The 3-File Framework
Every production Claude agent we run uses the same core architecture. Three files that define everything:
1. SOUL.md — Identity & Personality
This is who your agent is. Not what it does — who it is. Personality drives trust, and trust drives results.
# SOUL.md — Market Research Agent
## Identity
Name: Scout
Role: Market research analyst for B2B SaaS
Personality: Precise, data-driven, slightly skeptical
## Communication Style
- Lead with data, not opinions
- Always cite sources
- Flag uncertainty explicitly: "Low confidence: ..."
- Use tables for comparisons, never walls of text
## Boundaries
- Never fabricate statistics or data points
- Never present estimates as facts
- Always distinguish between primary and secondary sources
- Ask for clarification rather than guessing intent
2. AGENTS.md — Operational Rules
This defines how your agent operates. Autonomy levels, retry logic, tool permissions.
# AGENTS.md
## Every Session
1. Read SOUL.md (who you are)
2. Read USER.md (who you're helping)
3. Check memory/today.md for context
## Autonomy Levels
### Do Freely (no permission needed)
- Web searches, file reads, data analysis
- Draft creation, research compilation
- Internal calculations and comparisons
### Ask First (needs human approval)
- Sending emails or messages
- Publishing content
- Making purchases or commitments
- Modifying production systems
## Retry Logic
Minimum 3 attempts before asking for help:
1. Try the direct approach
2. Try with different parameters
3. Try an alternative method
3. USER.md — Human Context
Everything your agent needs to know about the human it serves. Preferences, business context, communication style.
# USER.md
## About
Name: Sarah Chen
Role: Head of Product, TechCorp
Timezone: PST (UTC-8)
## Preferences
- Prefers bullet points over paragraphs
- Likes data visualized, not described
- Morning person — schedule important updates before 10am
- Hates jargon. Explain like a smart non-expert.
## Business Context
- B2B SaaS, 50-person startup
- Series A, $8M raised
- Main competitor: AcmeCo
- Key metric: Monthly Active Users (MAU)
Setting Up Claude as an Agent
Step 1: API Setup
Install the Anthropic SDK and set up your first agent loop:
npm install @anthropic-ai/sdk
import Anthropic from '@anthropic-ai/sdk';
import fs from 'fs';
const client = new Anthropic();
// Load the 3-file framework
const soul = fs.readFileSync('./SOUL.md', 'utf8');
const agents = fs.readFileSync('./AGENTS.md', 'utf8');
const user = fs.readFileSync('./USER.md', 'utf8');
const systemPrompt = `${soul}\n\n${agents}\n\n${user}`;
async function agentLoop(userMessage) {
const messages = [{ role: 'user', content: userMessage }];
while (true) {
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 8096,
system: systemPrompt,
tools: getTools(),
messages,
});
// Add assistant response to history
messages.push({ role: 'assistant', content: response.content });
// Check if Claude wants to use tools
const toolUses = response.content.filter(b => b.type === 'tool_use');
if (toolUses.length === 0) {
// No more tool calls — agent is done
const text = response.content.find(b => b.type === 'text');
return text?.text || '';
}
// Execute each tool and return results
const toolResults = [];
for (const toolUse of toolUses) {
const result = await executeTool(toolUse.name, toolUse.input);
toolResults.push({
type: 'tool_result',
tool_use_id: toolUse.id,
content: JSON.stringify(result),
});
}
messages.push({ role: 'user', content: toolResults });
}
}
Step 2: Define Tools
Tools are what make Claude an agent instead of a chatbot. Define them with clear descriptions — Claude decides when and how to use them:
function getTools() {
return [
{
name: 'web_search',
description: 'Search the web for current information. Use for facts, news, data, and research.',
input_schema: {
type: 'object',
properties: {
query: { type: 'string', description: 'Search query' },
num_results: { type: 'number', description: 'Number of results (1-10)', default: 5 },
},
required: ['query'],
},
},
{
name: 'read_file',
description: 'Read a file from the workspace. Use for accessing documents, data, and configuration.',
input_schema: {
type: 'object',
properties: {
path: { type: 'string', description: 'File path relative to workspace' },
},
required: ['path'],
},
},
{
name: 'write_file',
description: 'Write content to a file. Creates the file if it does not exist.',
input_schema: {
type: 'object',
properties: {
path: { type: 'string', description: 'File path' },
content: { type: 'string', description: 'Content to write' },
},
required: ['path', 'content'],
},
},
{
name: 'run_command',
description: 'Execute a shell command. Use for data processing, API calls, and system operations.',
input_schema: {
type: 'object',
properties: {
command: { type: 'string', description: 'Shell command to execute' },
},
required: ['command'],
},
},
];
}
Step 3: Add Memory
Memory is what separates agents from chatbots. Here's a simple but effective memory system:
import fs from 'fs';
import path from 'path';
class AgentMemory {
constructor(memoryDir = './memory') {
this.memoryDir = memoryDir;
if (!fs.existsSync(memoryDir)) fs.mkdirSync(memoryDir, { recursive: true });
}
// Daily notes — what happened today
getTodayFile() {
const date = new Date().toISOString().split('T')[0];
return path.join(this.memoryDir, `${date}.md`);
}
log(entry) {
const file = this.getTodayFile();
const time = new Date().toLocaleTimeString('en-US', { hour12: false });
const line = `\n- [${time}] ${entry}`;
fs.appendFileSync(file, line);
}
// Load recent context (today + yesterday)
getRecentContext() {
const today = new Date();
const yesterday = new Date(today);
yesterday.setDate(today.getDate() - 1);
let context = '';
for (const date of [yesterday, today]) {
const file = path.join(this.memoryDir, `${date.toISOString().split('T')[0]}.md`);
if (fs.existsSync(file)) {
context += fs.readFileSync(file, 'utf8') + '\n\n';
}
}
return context;
}
// Long-term memory — curated important facts
addToLongTermMemory(fact) {
const file = path.join(this.memoryDir, 'MEMORY.md');
fs.appendFileSync(file, `\n- ${fact}`);
}
}
// Add memory to the system prompt
const memory = new AgentMemory();
const recentContext = memory.getRecentContext();
const longTermMemory = fs.existsSync('./memory/MEMORY.md')
? fs.readFileSync('./memory/MEMORY.md', 'utf8')
: '';
const systemPrompt = `${soul}\n\n${agents}\n\n${user}
\n\n## Recent Memory\n${recentContext}
\n\n## Long-term Memory\n${longTermMemory}`;
Making It Autonomous: The Cron Loop
A real agent doesn't wait for you to talk to it. It runs on a schedule, picks up tasks, and executes:
import cron from 'node-cron';
// Run every hour
cron.schedule('0 * * * *', async () => {
const memory = new AgentMemory();
// Load pending tasks
const tasks = fs.readFileSync('./tasks.md', 'utf8');
const result = await agentLoop(`
You are running autonomously. Current time: ${new Date().toISOString()}
## Your Tasks
${tasks}
## Instructions
1. Pick the highest priority uncompleted task
2. Execute it using your tools
3. Log what you did in memory
4. Update the task list
5. Report results
Work for at least 15 minutes of productive output.
`);
memory.log(`Autonomous run completed: ${result.substring(0, 200)}`);
});
Advanced: Extended Thinking
For complex reasoning tasks, enable Claude's extended thinking. This lets the model "think out loud" before responding:
const response = await client.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 16000,
thinking: {
type: 'enabled',
budget_tokens: 10000, // Let Claude think for up to 10K tokens
},
system: systemPrompt,
tools: getTools(),
messages,
});
// The response includes thinking blocks
for (const block of response.content) {
if (block.type === 'thinking') {
console.log('Claude is reasoning:', block.thinking);
}
}
Extended thinking is especially useful for:
- Multi-step planning (breaking complex tasks into subtasks)
- Debugging (analyzing error patterns before suggesting fixes)
- Decision making (weighing pros and cons before recommending an action)
- Code generation (planning architecture before writing code)
Advanced: MCP (Model Context Protocol)
MCP lets Claude connect to external tools and data sources through a standardized protocol. Instead of building custom tool integrations, you expose an MCP server:
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
import { z } from 'zod';
const server = new McpServer({
name: 'my-business-tools',
version: '1.0.0',
});
// Expose a tool via MCP
server.tool(
'get_customer_data',
'Look up customer information by email or company name',
{
query: z.string().describe('Customer email or company name'),
fields: z.array(z.string()).optional().describe('Specific fields to return'),
},
async ({ query, fields }) => {
const customer = await db.customers.findOne({ email: query });
return {
content: [{
type: 'text',
text: JSON.stringify(customer, null, 2),
}],
};
}
);
// Connect via stdio transport
const transport = new StdioServerTransport();
await server.connect(transport);
MCP gives you a clean separation between your agent logic and your tool implementations. Claude Desktop, Claude Code, and other MCP clients can all use the same server.
Claude vs GPT vs Open Source for Agents
| Feature | Claude | GPT-4 | Open Source (Llama) |
|---|---|---|---|
| Context window | 200K tokens | 128K tokens | 8-128K tokens |
| Tool use reliability | Excellent | Good | Variable |
| System prompt adherence | Excellent | Good | Poor-Fair |
| Extended thinking | Native | Via o1/o3 | No |
| MCP support | Native | Limited | Community |
| Cost (per 1M tokens) | $3-15 | $2.50-10 | Free (infra costs) |
| Best for | Complex reasoning, long tasks | General purpose | Simple, high-volume tasks |
Production Patterns That Work
Pattern 1: The Heartbeat
Run a lightweight check every 15 minutes. Only do heavy work when needed:
// Heartbeat: quick check, heavy action only when needed
async function heartbeat() {
const status = await checkAllSystems();
if (status.issues.length === 0) return 'HEARTBEAT_OK';
// Only spin up the full agent for real issues
return agentLoop(`Issues detected: ${JSON.stringify(status.issues)}. Investigate and fix.`);
}
Pattern 2: Sub-Agent Spawning
For complex tasks, spawn specialized sub-agents:
// Main agent delegates to specialists
const tools = [{
name: 'spawn_researcher',
description: 'Spawn a research sub-agent for deep investigation',
input_schema: {
type: 'object',
properties: {
topic: { type: 'string' },
depth: { type: 'string', enum: ['quick', 'thorough', 'exhaustive'] },
},
required: ['topic'],
},
}];
async function spawnResearcher(topic, depth) {
// Sub-agent with its own specialized system prompt
const researchPrompt = `You are a research specialist. Your ONLY job is to
research "${topic}" at ${depth} depth. Use web_search extensively.
Return structured findings with sources.`;
return agentLoop(researchPrompt, `Research: ${topic}`);
}
Pattern 3: Memory Consolidation
End-of-day: have Claude review daily notes and extract important patterns:
// Nightly consolidation
async function consolidateMemory() {
const todayNotes = fs.readFileSync(memory.getTodayFile(), 'utf8');
const longTermMemory = fs.readFileSync('./memory/MEMORY.md', 'utf8');
const result = await agentLoop(`
Review today's notes and extract anything worth remembering long-term.
## Today's Notes
${todayNotes}
## Current Long-term Memory
${longTermMemory}
Rules:
- Only add genuinely new, important information
- Remove outdated entries from long-term memory
- Keep it concise — each entry should be one line
- Focus on patterns, preferences, and decisions
`);
return result;
}
Common Mistakes (and How to Avoid Them)
- Too much autonomy too fast. Start with read-only tools. Add write access gradually as you build trust.
- No memory system. Without persistent memory, your agent relearns everything every session. That's a chatbot, not an agent.
- Ignoring cost. A Claude Opus agent running every 15 minutes with 200K context will cost $500+/month. Use Sonnet for routine tasks, Opus for complex reasoning.
- Monolithic system prompts. The 3-file framework exists for a reason. Separate identity, operations, and user context so you can update each independently.
- No logging. If your agent does something wrong at 3am, you need to know what happened and why. Log everything.
- Skipping the thinking budget. For complex tasks, extended thinking dramatically improves quality. Don't skip it to save tokens.
- Not testing tool descriptions. Bad descriptions = bad tool selection. Test with edge cases.
Cost Optimization
Running Claude agents 24/7 gets expensive fast. Here's how we keep costs manageable:
- Tiered model selection: Use Haiku for simple checks, Sonnet for standard tasks, Opus for complex reasoning
- Context window management: Don't stuff 200K tokens every call. Load only relevant memory and context
- Caching: Anthropic's prompt caching can save 90% on repeated system prompts
- Smart scheduling: Run full agent loops hourly, not every minute. Use lightweight heartbeats in between
- Exit early: If the agent determines no action is needed, return immediately instead of burning tokens on a detailed "nothing to do" response
Build Your Agent's Personality
Use our free SOUL.md Generator to create a production-ready personality file for your Claude agent in 5 minutes.
Generate Your SOUL.mdDeployment Checklist
Before you deploy your Claude agent to production:
- ☐ SOUL.md, AGENTS.md, USER.md are complete and tested
- ☐ All tools tested individually with edge cases
- ☐ Memory system persists across restarts
- ☐ Autonomy boundaries clearly defined (whitelist, not blacklist)
- ☐ Logging captures every tool call and response
- ☐ Error handling with retry logic (min 3 attempts)
- ☐ Cost monitoring and alerts set up
- ☐ Kill switch — ability to stop the agent immediately
- ☐ Regular human review of agent logs (at least weekly)
- ☐ Separate API keys for dev and production
What's Next
Once your Claude agent is running, the real learning begins. Watch how it handles edge cases. Read the logs. Adjust the system prompt based on real behavior, not theory.
The best agents aren't built in a day. They're iterated over weeks and months, getting more capable and more trusted over time. Start simple, ship fast, and improve continuously.
Go deeper with the AI Employee Playbook
The complete system: 3-file framework, memory architecture, autonomy levels, and 15 production templates.
Get the Playbook — €29