How to Build an AI Agent with OpenAI (GPT-4): Complete Guide

OpenAI's GPT-4 powers more AI agents than any other model. But most people use it wrong — they build fancy chatbots and call them agents. This guide shows you how to build a real agent with GPT-4: one that uses tools, remembers context, and works autonomously.

We'll cover two approaches: building from scratch with the Chat Completions API (more control), and using the Assistants API (faster to ship). Both get you to production.

128K
GPT-4 context window
128
Max parallel tool calls
2
API approaches
$2.50
Per 1M input tokens

Two Paths: Chat Completions vs Assistants API

OpenAI gives you two ways to build agents. Here's when to use each:

Feature Chat Completions Assistants API
Control Full control over everything OpenAI manages threads & state
Memory You build it Built-in thread history
File handling You implement Built-in file search & code interpreter
Streaming Native support Streaming runs
Cost Pay per token Pay per token + storage
Best for Custom agents, full autonomy Quick prototypes, file-heavy workflows
Our recommendation: Start with Chat Completions for production agents. You need full control over memory, tool execution, and error handling. The Assistants API is great for prototyping but limits you when you need custom behavior.

Path 1: Building from Scratch (Chat Completions)

Step 1: Project Setup

npm install openai
import OpenAI from 'openai';
import fs from 'fs';

const client = new OpenAI();

// Load the 3-file framework
const soul = fs.readFileSync('./SOUL.md', 'utf8');
const agents = fs.readFileSync('./AGENTS.md', 'utf8');
const user = fs.readFileSync('./USER.md', 'utf8');

const systemPrompt = `${soul}\n\n${agents}\n\n${user}`;

Step 2: The Agent Loop

The core pattern: send a message, check if GPT wants to use tools, execute them, send results back. Repeat until done.

async function agentLoop(userMessage) {
  const messages = [
    { role: 'system', content: systemPrompt },
    { role: 'user', content: userMessage },
  ];

  while (true) {
    const response = await client.chat.completions.create({
      model: 'gpt-4o',
      messages,
      tools: getTools(),
      tool_choice: 'auto',
    });

    const choice = response.choices[0];
    messages.push(choice.message);

    // If no tool calls, we're done
    if (!choice.message.tool_calls || choice.message.tool_calls.length === 0) {
      return choice.message.content;
    }

    // Execute each tool call
    for (const toolCall of choice.message.tool_calls) {
      const args = JSON.parse(toolCall.function.arguments);
      const result = await executeTool(toolCall.function.name, args);

      messages.push({
        role: 'tool',
        tool_call_id: toolCall.id,
        content: JSON.stringify(result),
      });
    }
  }
}

Step 3: Define Tools (Function Calling)

OpenAI's function calling uses JSON Schema to describe tools. Be precise — the descriptions directly affect when GPT chooses to use each tool:

function getTools() {
  return [
    {
      type: 'function',
      function: {
        name: 'web_search',
        description: 'Search the web for current information. Returns titles, URLs, and snippets.',
        parameters: {
          type: 'object',
          properties: {
            query: {
              type: 'string',
              description: 'The search query. Be specific for better results.',
            },
            num_results: {
              type: 'number',
              description: 'Number of results to return (1-10)',
              default: 5,
            },
          },
          required: ['query'],
        },
      },
    },
    {
      type: 'function',
      function: {
        name: 'read_file',
        description: 'Read a file from the workspace. Supports text, JSON, CSV, and markdown.',
        parameters: {
          type: 'object',
          properties: {
            path: { type: 'string', description: 'File path relative to workspace root' },
          },
          required: ['path'],
        },
      },
    },
    {
      type: 'function',
      function: {
        name: 'write_file',
        description: 'Write content to a file. Creates parent directories if needed.',
        parameters: {
          type: 'object',
          properties: {
            path: { type: 'string', description: 'File path' },
            content: { type: 'string', description: 'File content' },
          },
          required: ['path', 'content'],
        },
      },
    },
    {
      type: 'function',
      function: {
        name: 'execute_code',
        description: 'Execute Python code for data analysis, calculations, or file processing.',
        parameters: {
          type: 'object',
          properties: {
            code: { type: 'string', description: 'Python code to execute' },
          },
          required: ['code'],
        },
      },
    },
  ];
}

async function executeTool(name, args) {
  switch (name) {
    case 'web_search':
      return await searchWeb(args.query, args.num_results);
    case 'read_file':
      return { content: fs.readFileSync(args.path, 'utf8') };
    case 'write_file':
      fs.writeFileSync(args.path, args.content);
      return { success: true, path: args.path };
    case 'execute_code':
      return await runPython(args.code);
    default:
      return { error: `Unknown tool: ${name}` };
  }
}

Step 4: Add Memory

GPT-4 forgets everything between API calls. You need to build persistence:

class AgentMemory {
  constructor(dir = './memory') {
    this.dir = dir;
    if (!fs.existsSync(dir)) fs.mkdirSync(dir, { recursive: true });
  }

  // Daily log — append-only
  log(entry) {
    const date = new Date().toISOString().split('T')[0];
    const time = new Date().toTimeString().split(' ')[0];
    const file = `${this.dir}/${date}.md`;
    fs.appendFileSync(file, `\n- [${time}] ${entry}`);
  }

  // Load recent context for the system prompt
  getContext(days = 2) {
    let context = '';
    for (let i = days - 1; i >= 0; i--) {
      const date = new Date();
      date.setDate(date.getDate() - i);
      const file = `${this.dir}/${date.toISOString().split('T')[0]}.md`;
      if (fs.existsSync(file)) {
        context += `\n## ${date.toISOString().split('T')[0]}\n`;
        context += fs.readFileSync(file, 'utf8');
      }
    }
    return context;
  }

  // Long-term memory — important facts only
  remember(fact) {
    fs.appendFileSync(`${this.dir}/MEMORY.md`, `\n- ${fact}`);
  }

  getLongTermMemory() {
    const file = `${this.dir}/MEMORY.md`;
    return fs.existsSync(file) ? fs.readFileSync(file, 'utf8') : '';
  }
}

// Inject memory into every agent call
const memory = new AgentMemory();

function buildSystemPrompt() {
  const recentContext = memory.getContext(2);
  const longTerm = memory.getLongTermMemory();
  return `${soul}\n\n${agents}\n\n${user}
\n## Recent Activity\n${recentContext}
\n## Long-term Memory\n${longTerm}`;
}
Memory management tip: Don't dump your entire memory into every API call. GPT-4's 128K window is large but not infinite. Summarize old days, keep recent days detailed. Budget ~10K tokens for memory, leaving room for the actual conversation.

Path 2: The Assistants API

If you want built-in thread management, file search, and code interpretation, the Assistants API handles the plumbing:

import OpenAI from 'openai';

const client = new OpenAI();

// Create an assistant (do this once)
async function createAssistant() {
  return await client.beta.assistants.create({
    name: 'Research Agent',
    instructions: systemPrompt,
    model: 'gpt-4o',
    tools: [
      { type: 'file_search' },      // Built-in RAG
      { type: 'code_interpreter' },  // Built-in Python execution
      {
        type: 'function',
        function: {
          name: 'web_search',
          description: 'Search the web for current information',
          parameters: {
            type: 'object',
            properties: {
              query: { type: 'string' },
            },
            required: ['query'],
          },
        },
      },
    ],
  });
}

// Run a conversation
async function chat(assistantId, userMessage) {
  // Create a thread (or reuse one for ongoing conversations)
  const thread = await client.beta.threads.create();

  // Add the user message
  await client.beta.threads.messages.create(thread.id, {
    role: 'user',
    content: userMessage,
  });

  // Run the assistant
  let run = await client.beta.threads.runs.create(thread.id, {
    assistant_id: assistantId,
  });

  // Poll for completion (or use streaming)
  while (['queued', 'in_progress', 'requires_action'].includes(run.status)) {
    if (run.status === 'requires_action') {
      // Handle function calls
      const toolOutputs = [];
      for (const call of run.required_action.submit_tool_outputs.tool_calls) {
        const args = JSON.parse(call.function.arguments);
        const result = await executeTool(call.function.name, args);
        toolOutputs.push({
          tool_call_id: call.id,
          output: JSON.stringify(result),
        });
      }
      run = await client.beta.threads.runs.submitToolOutputs(
        thread.id, run.id, { tool_outputs: toolOutputs }
      );
    } else {
      await new Promise(r => setTimeout(r, 1000));
      run = await client.beta.threads.runs.retrieve(thread.id, run.id);
    }
  }

  // Get the response
  const messages = await client.beta.threads.messages.list(thread.id);
  return messages.data[0].content[0].text.value;
}

Structured Outputs: Reliable Tool Calls

One of GPT-4's strengths for agents is structured outputs — guaranteed JSON schema compliance:

const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages,
  response_format: {
    type: 'json_schema',
    json_schema: {
      name: 'task_plan',
      schema: {
        type: 'object',
        properties: {
          tasks: {
            type: 'array',
            items: {
              type: 'object',
              properties: {
                id: { type: 'number' },
                action: { type: 'string' },
                tool: { type: 'string' },
                priority: { type: 'string', enum: ['high', 'medium', 'low'] },
                estimated_minutes: { type: 'number' },
              },
              required: ['id', 'action', 'tool', 'priority'],
            },
          },
          reasoning: { type: 'string' },
        },
        required: ['tasks', 'reasoning'],
      },
    },
  },
});

// Guaranteed valid JSON matching the schema
const plan = JSON.parse(response.choices[0].message.content);

This is extremely useful for agent planning — you can force GPT to output a structured execution plan before it starts working.

Advanced: The Reasoning Models (o1/o3)

For complex multi-step reasoning, OpenAI's o1 and o3 models think before they respond. Use them for planning, not execution:

// Use o3-mini for planning, GPT-4o for execution
async function planAndExecute(task) {
  // Step 1: Plan with reasoning model
  const plan = await client.chat.completions.create({
    model: 'o3-mini',
    messages: [
      { role: 'user', content: `Break this task into concrete steps with specific tool calls:\n\n${task}` },
    ],
    response_format: { type: 'json_object' },
  });

  const steps = JSON.parse(plan.choices[0].message.content);

  // Step 2: Execute each step with GPT-4o (cheaper, faster)
  for (const step of steps.steps) {
    await agentLoop(`Execute this step: ${JSON.stringify(step)}`);
  }
}
Cost optimization: Don't use o1/o3 for everything. Use them for planning and complex decisions, then execute with GPT-4o-mini. This cuts costs by 90% while keeping quality high for the parts that matter.

Making It Autonomous

Same pattern as any agent — cron-based autonomy:

import cron from 'node-cron';

const memory = new AgentMemory();

// Hourly autonomous run
cron.schedule('0 * * * *', async () => {
  const systemPrompt = buildSystemPrompt();
  const tasks = fs.readFileSync('./tasks.md', 'utf8');

  try {
    const result = await agentLoop(`
      Autonomous run. Time: ${new Date().toISOString()}

      ## Pending Tasks
      ${tasks}

      Pick the highest-priority task and execute it.
      Log everything you do. Update the task list when done.
    `);

    memory.log(`Autonomous run: ${result.substring(0, 200)}`);
  } catch (error) {
    memory.log(`ERROR in autonomous run: ${error.message}`);
    // Alert human if critical
    if (error.message.includes('rate_limit') || error.message.includes('quota')) {
      await sendAlert('Agent hit rate limit — needs attention');
    }
  }
});

OpenAI-Specific Gotchas

  1. Rate limits are aggressive. GPT-4 has strict TPM (tokens per minute) limits. Build exponential backoff into every API call. Don't learn this the hard way at 3am.
  2. Function call arguments aren't always valid JSON. Despite improvements, GPT-4 occasionally generates malformed function arguments. Always wrap JSON.parse() in try/catch.
  3. The Assistants API charges for storage. Threads, files, and vector stores all cost money. Clean up old threads regularly or your bill will surprise you.
  4. Parallel function calls can be inconsistent. When GPT calls multiple tools simultaneously, the order of execution matters. If Tool B depends on Tool A's result, you need to handle this in your executor.
  5. System messages get diluted in long conversations. GPT-4 pays less attention to the system prompt as conversations grow longer. Reinforce critical instructions periodically by injecting reminders.
  6. Token counting is non-trivial. Use tiktoken to count tokens before sending. Exceeding the context window silently truncates your messages — you won't get an error, just worse results.

Error Handling for Production

import OpenAI from 'openai';

const client = new OpenAI();

async function resilientCall(messages, maxRetries = 3) {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      return await client.chat.completions.create({
        model: 'gpt-4o',
        messages,
        tools: getTools(),
      });
    } catch (error) {
      if (error.status === 429) {
        // Rate limited — exponential backoff
        const wait = Math.pow(2, attempt) * 1000;
        console.log(`Rate limited. Waiting ${wait}ms (attempt ${attempt}/${maxRetries})`);
        await new Promise(r => setTimeout(r, wait));
      } else if (error.status === 500 || error.status === 503) {
        // Server error — retry with backoff
        await new Promise(r => setTimeout(r, attempt * 2000));
      } else {
        throw error; // Don't retry client errors
      }
    }
  }
  throw new Error('Max retries exceeded');
}

// Token counting to prevent context overflow
import { encoding_for_model } from 'tiktoken';

function countTokens(messages) {
  const enc = encoding_for_model('gpt-4o');
  let total = 0;
  for (const msg of messages) {
    total += enc.encode(typeof msg.content === 'string' ? msg.content : JSON.stringify(msg.content)).length;
    total += 4; // per-message overhead
  }
  return total;
}

// Trim oldest messages if context gets too long
function trimMessages(messages, maxTokens = 100000) {
  while (countTokens(messages) > maxTokens && messages.length > 2) {
    // Keep system message and last user message, trim from middle
    messages.splice(1, 1);
  }
  return messages;
}

Cost Breakdown: What to Expect

Model Input (per 1M tokens) Output (per 1M tokens) Best use in agents
GPT-4o $2.50 $10.00 Main agent model — good balance
GPT-4o-mini $0.15 $0.60 Simple tasks, high volume, classification
o3-mini $1.10 $4.40 Planning, complex reasoning
o1 $15.00 $60.00 Critical decisions only

Typical monthly costs for a production agent running hourly with GPT-4o:

OpenAI vs Claude for Agents: Honest Take

Having built agents on both platforms, here's the real difference:

For most agents, either works. Pick based on your specific needs, not hype. If you need large context or strong personality adherence, go Claude. If you need structured outputs and a bigger ecosystem, go OpenAI.

Build Your Agent's Personality

Whether you use OpenAI or Claude, every agent needs a SOUL.md. Use our free generator to create one in 5 minutes.

Generate Your SOUL.md

Deployment Checklist

Go deeper with the AI Employee Playbook

The complete system: 3-file framework, memory architecture, autonomy levels, and 15 production templates.

Get the Playbook — €29