How to Build an AI Agent with Claude: Complete Guide (2026)

Claude isn't just a chatbot. With the right architecture, it becomes an autonomous agent that researches, decides, and executes — while you sleep. This guide shows you exactly how to build one, from first API call to production deployment.

We run multiple Claude-powered agents in production 24/7. One manages our content pipeline. Another monitors databases and fixes issues autonomously. A third handles research and analysis. This guide is based on what actually works, not what sounds impressive in a demo.

4
Claude agents in production
24/7
Always-on operation
200K
Context window
128
Max tool calls per turn

Why Claude for AI Agents?

Not all LLMs are equal when it comes to agentic behavior. Claude has specific advantages:

Quick reality check: An AI agent is not a chatbot with extra steps. A real agent has memory that persists, tools it can use autonomously, and boundaries that keep it safe. If your "agent" forgets everything between conversations, it's still just a chatbot.

Architecture: The 3-File Framework

Every production Claude agent we run uses the same core architecture. Three files that define everything:

1. SOUL.md — Identity & Personality

This is who your agent is. Not what it does — who it is. Personality drives trust, and trust drives results.

# SOUL.md — Market Research Agent

## Identity
Name: Scout
Role: Market research analyst for B2B SaaS
Personality: Precise, data-driven, slightly skeptical

## Communication Style
- Lead with data, not opinions
- Always cite sources
- Flag uncertainty explicitly: "Low confidence: ..."
- Use tables for comparisons, never walls of text

## Boundaries
- Never fabricate statistics or data points
- Never present estimates as facts
- Always distinguish between primary and secondary sources
- Ask for clarification rather than guessing intent

2. AGENTS.md — Operational Rules

This defines how your agent operates. Autonomy levels, retry logic, tool permissions.

# AGENTS.md

## Every Session
1. Read SOUL.md (who you are)
2. Read USER.md (who you're helping)
3. Check memory/today.md for context

## Autonomy Levels
### Do Freely (no permission needed)
- Web searches, file reads, data analysis
- Draft creation, research compilation
- Internal calculations and comparisons

### Ask First (needs human approval)
- Sending emails or messages
- Publishing content
- Making purchases or commitments
- Modifying production systems

## Retry Logic
Minimum 3 attempts before asking for help:
1. Try the direct approach
2. Try with different parameters
3. Try an alternative method

3. USER.md — Human Context

Everything your agent needs to know about the human it serves. Preferences, business context, communication style.

# USER.md

## About
Name: Sarah Chen
Role: Head of Product, TechCorp
Timezone: PST (UTC-8)

## Preferences
- Prefers bullet points over paragraphs
- Likes data visualized, not described
- Morning person — schedule important updates before 10am
- Hates jargon. Explain like a smart non-expert.

## Business Context
- B2B SaaS, 50-person startup
- Series A, $8M raised
- Main competitor: AcmeCo
- Key metric: Monthly Active Users (MAU)

Setting Up Claude as an Agent

Step 1: API Setup

Install the Anthropic SDK and set up your first agent loop:

npm install @anthropic-ai/sdk
import Anthropic from '@anthropic-ai/sdk';
import fs from 'fs';

const client = new Anthropic();

// Load the 3-file framework
const soul = fs.readFileSync('./SOUL.md', 'utf8');
const agents = fs.readFileSync('./AGENTS.md', 'utf8');
const user = fs.readFileSync('./USER.md', 'utf8');

const systemPrompt = `${soul}\n\n${agents}\n\n${user}`;

async function agentLoop(userMessage) {
  const messages = [{ role: 'user', content: userMessage }];

  while (true) {
    const response = await client.messages.create({
      model: 'claude-sonnet-4-20250514',
      max_tokens: 8096,
      system: systemPrompt,
      tools: getTools(),
      messages,
    });

    // Add assistant response to history
    messages.push({ role: 'assistant', content: response.content });

    // Check if Claude wants to use tools
    const toolUses = response.content.filter(b => b.type === 'tool_use');

    if (toolUses.length === 0) {
      // No more tool calls — agent is done
      const text = response.content.find(b => b.type === 'text');
      return text?.text || '';
    }

    // Execute each tool and return results
    const toolResults = [];
    for (const toolUse of toolUses) {
      const result = await executeTool(toolUse.name, toolUse.input);
      toolResults.push({
        type: 'tool_result',
        tool_use_id: toolUse.id,
        content: JSON.stringify(result),
      });
    }

    messages.push({ role: 'user', content: toolResults });
  }
}

Step 2: Define Tools

Tools are what make Claude an agent instead of a chatbot. Define them with clear descriptions — Claude decides when and how to use them:

function getTools() {
  return [
    {
      name: 'web_search',
      description: 'Search the web for current information. Use for facts, news, data, and research.',
      input_schema: {
        type: 'object',
        properties: {
          query: { type: 'string', description: 'Search query' },
          num_results: { type: 'number', description: 'Number of results (1-10)', default: 5 },
        },
        required: ['query'],
      },
    },
    {
      name: 'read_file',
      description: 'Read a file from the workspace. Use for accessing documents, data, and configuration.',
      input_schema: {
        type: 'object',
        properties: {
          path: { type: 'string', description: 'File path relative to workspace' },
        },
        required: ['path'],
      },
    },
    {
      name: 'write_file',
      description: 'Write content to a file. Creates the file if it does not exist.',
      input_schema: {
        type: 'object',
        properties: {
          path: { type: 'string', description: 'File path' },
          content: { type: 'string', description: 'Content to write' },
        },
        required: ['path', 'content'],
      },
    },
    {
      name: 'run_command',
      description: 'Execute a shell command. Use for data processing, API calls, and system operations.',
      input_schema: {
        type: 'object',
        properties: {
          command: { type: 'string', description: 'Shell command to execute' },
        },
        required: ['command'],
      },
    },
  ];
}
Pro tip: Tool descriptions matter more than you think. Claude uses them to decide when to call a tool. Vague descriptions = wrong tool choices. Be specific about what each tool is good at.

Step 3: Add Memory

Memory is what separates agents from chatbots. Here's a simple but effective memory system:

import fs from 'fs';
import path from 'path';

class AgentMemory {
  constructor(memoryDir = './memory') {
    this.memoryDir = memoryDir;
    if (!fs.existsSync(memoryDir)) fs.mkdirSync(memoryDir, { recursive: true });
  }

  // Daily notes — what happened today
  getTodayFile() {
    const date = new Date().toISOString().split('T')[0];
    return path.join(this.memoryDir, `${date}.md`);
  }

  log(entry) {
    const file = this.getTodayFile();
    const time = new Date().toLocaleTimeString('en-US', { hour12: false });
    const line = `\n- [${time}] ${entry}`;
    fs.appendFileSync(file, line);
  }

  // Load recent context (today + yesterday)
  getRecentContext() {
    const today = new Date();
    const yesterday = new Date(today);
    yesterday.setDate(today.getDate() - 1);

    let context = '';
    for (const date of [yesterday, today]) {
      const file = path.join(this.memoryDir, `${date.toISOString().split('T')[0]}.md`);
      if (fs.existsSync(file)) {
        context += fs.readFileSync(file, 'utf8') + '\n\n';
      }
    }
    return context;
  }

  // Long-term memory — curated important facts
  addToLongTermMemory(fact) {
    const file = path.join(this.memoryDir, 'MEMORY.md');
    fs.appendFileSync(file, `\n- ${fact}`);
  }
}

// Add memory to the system prompt
const memory = new AgentMemory();
const recentContext = memory.getRecentContext();
const longTermMemory = fs.existsSync('./memory/MEMORY.md')
  ? fs.readFileSync('./memory/MEMORY.md', 'utf8')
  : '';

const systemPrompt = `${soul}\n\n${agents}\n\n${user}
\n\n## Recent Memory\n${recentContext}
\n\n## Long-term Memory\n${longTermMemory}`;

Making It Autonomous: The Cron Loop

A real agent doesn't wait for you to talk to it. It runs on a schedule, picks up tasks, and executes:

import cron from 'node-cron';

// Run every hour
cron.schedule('0 * * * *', async () => {
  const memory = new AgentMemory();

  // Load pending tasks
  const tasks = fs.readFileSync('./tasks.md', 'utf8');

  const result = await agentLoop(`
    You are running autonomously. Current time: ${new Date().toISOString()}

    ## Your Tasks
    ${tasks}

    ## Instructions
    1. Pick the highest priority uncompleted task
    2. Execute it using your tools
    3. Log what you did in memory
    4. Update the task list
    5. Report results

    Work for at least 15 minutes of productive output.
  `);

  memory.log(`Autonomous run completed: ${result.substring(0, 200)}`);
});
Safety first: Autonomous agents need guardrails. Always define what your agent can and cannot do without human approval. Start with a restrictive whitelist and expand as you build trust.

Advanced: Extended Thinking

For complex reasoning tasks, enable Claude's extended thinking. This lets the model "think out loud" before responding:

const response = await client.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 16000,
  thinking: {
    type: 'enabled',
    budget_tokens: 10000, // Let Claude think for up to 10K tokens
  },
  system: systemPrompt,
  tools: getTools(),
  messages,
});

// The response includes thinking blocks
for (const block of response.content) {
  if (block.type === 'thinking') {
    console.log('Claude is reasoning:', block.thinking);
  }
}

Extended thinking is especially useful for:

Advanced: MCP (Model Context Protocol)

MCP lets Claude connect to external tools and data sources through a standardized protocol. Instead of building custom tool integrations, you expose an MCP server:

import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
import { z } from 'zod';

const server = new McpServer({
  name: 'my-business-tools',
  version: '1.0.0',
});

// Expose a tool via MCP
server.tool(
  'get_customer_data',
  'Look up customer information by email or company name',
  {
    query: z.string().describe('Customer email or company name'),
    fields: z.array(z.string()).optional().describe('Specific fields to return'),
  },
  async ({ query, fields }) => {
    const customer = await db.customers.findOne({ email: query });
    return {
      content: [{
        type: 'text',
        text: JSON.stringify(customer, null, 2),
      }],
    };
  }
);

// Connect via stdio transport
const transport = new StdioServerTransport();
await server.connect(transport);

MCP gives you a clean separation between your agent logic and your tool implementations. Claude Desktop, Claude Code, and other MCP clients can all use the same server.

Claude vs GPT vs Open Source for Agents

Feature Claude GPT-4 Open Source (Llama)
Context window 200K tokens 128K tokens 8-128K tokens
Tool use reliability Excellent Good Variable
System prompt adherence Excellent Good Poor-Fair
Extended thinking Native Via o1/o3 No
MCP support Native Limited Community
Cost (per 1M tokens) $3-15 $2.50-10 Free (infra costs)
Best for Complex reasoning, long tasks General purpose Simple, high-volume tasks

Production Patterns That Work

Pattern 1: The Heartbeat

Run a lightweight check every 15 minutes. Only do heavy work when needed:

// Heartbeat: quick check, heavy action only when needed
async function heartbeat() {
  const status = await checkAllSystems();
  if (status.issues.length === 0) return 'HEARTBEAT_OK';

  // Only spin up the full agent for real issues
  return agentLoop(`Issues detected: ${JSON.stringify(status.issues)}. Investigate and fix.`);
}

Pattern 2: Sub-Agent Spawning

For complex tasks, spawn specialized sub-agents:

// Main agent delegates to specialists
const tools = [{
  name: 'spawn_researcher',
  description: 'Spawn a research sub-agent for deep investigation',
  input_schema: {
    type: 'object',
    properties: {
      topic: { type: 'string' },
      depth: { type: 'string', enum: ['quick', 'thorough', 'exhaustive'] },
    },
    required: ['topic'],
  },
}];

async function spawnResearcher(topic, depth) {
  // Sub-agent with its own specialized system prompt
  const researchPrompt = `You are a research specialist. Your ONLY job is to
  research "${topic}" at ${depth} depth. Use web_search extensively.
  Return structured findings with sources.`;

  return agentLoop(researchPrompt, `Research: ${topic}`);
}

Pattern 3: Memory Consolidation

End-of-day: have Claude review daily notes and extract important patterns:

// Nightly consolidation
async function consolidateMemory() {
  const todayNotes = fs.readFileSync(memory.getTodayFile(), 'utf8');
  const longTermMemory = fs.readFileSync('./memory/MEMORY.md', 'utf8');

  const result = await agentLoop(`
    Review today's notes and extract anything worth remembering long-term.

    ## Today's Notes
    ${todayNotes}

    ## Current Long-term Memory
    ${longTermMemory}

    Rules:
    - Only add genuinely new, important information
    - Remove outdated entries from long-term memory
    - Keep it concise — each entry should be one line
    - Focus on patterns, preferences, and decisions
  `);

  return result;
}

Common Mistakes (and How to Avoid Them)

  1. Too much autonomy too fast. Start with read-only tools. Add write access gradually as you build trust.
  2. No memory system. Without persistent memory, your agent relearns everything every session. That's a chatbot, not an agent.
  3. Ignoring cost. A Claude Opus agent running every 15 minutes with 200K context will cost $500+/month. Use Sonnet for routine tasks, Opus for complex reasoning.
  4. Monolithic system prompts. The 3-file framework exists for a reason. Separate identity, operations, and user context so you can update each independently.
  5. No logging. If your agent does something wrong at 3am, you need to know what happened and why. Log everything.
  6. Skipping the thinking budget. For complex tasks, extended thinking dramatically improves quality. Don't skip it to save tokens.
  7. Not testing tool descriptions. Bad descriptions = bad tool selection. Test with edge cases.

Cost Optimization

Running Claude agents 24/7 gets expensive fast. Here's how we keep costs manageable:

Build Your Agent's Personality

Use our free SOUL.md Generator to create a production-ready personality file for your Claude agent in 5 minutes.

Generate Your SOUL.md

Deployment Checklist

Before you deploy your Claude agent to production:

What's Next

Once your Claude agent is running, the real learning begins. Watch how it handles edge cases. Read the logs. Adjust the system prompt based on real behavior, not theory.

The best agents aren't built in a day. They're iterated over weeks and months, getting more capable and more trusted over time. Start simple, ship fast, and improve continuously.

Go deeper with the AI Employee Playbook

The complete system: 3-file framework, memory architecture, autonomy levels, and 15 production templates.

Get the Playbook — €29