How to Build an AI Agent with OpenAI (GPT-4): Complete Guide
OpenAI's GPT-4 powers more AI agents than any other model. But most people use it wrong — they build fancy chatbots and call them agents. This guide shows you how to build a real agent with GPT-4: one that uses tools, remembers context, and works autonomously.
We'll cover two approaches: building from scratch with the Chat Completions API (more control), and using the Assistants API (faster to ship). Both get you to production.
Two Paths: Chat Completions vs Assistants API
OpenAI gives you two ways to build agents. Here's when to use each:
| Feature | Chat Completions | Assistants API |
|---|---|---|
| Control | Full control over everything | OpenAI manages threads & state |
| Memory | You build it | Built-in thread history |
| File handling | You implement | Built-in file search & code interpreter |
| Streaming | Native support | Streaming runs |
| Cost | Pay per token | Pay per token + storage |
| Best for | Custom agents, full autonomy | Quick prototypes, file-heavy workflows |
Path 1: Building from Scratch (Chat Completions)
Step 1: Project Setup
npm install openai
import OpenAI from 'openai';
import fs from 'fs';
const client = new OpenAI();
// Load the 3-file framework
const soul = fs.readFileSync('./SOUL.md', 'utf8');
const agents = fs.readFileSync('./AGENTS.md', 'utf8');
const user = fs.readFileSync('./USER.md', 'utf8');
const systemPrompt = `${soul}\n\n${agents}\n\n${user}`;
Step 2: The Agent Loop
The core pattern: send a message, check if GPT wants to use tools, execute them, send results back. Repeat until done.
async function agentLoop(userMessage) {
const messages = [
{ role: 'system', content: systemPrompt },
{ role: 'user', content: userMessage },
];
while (true) {
const response = await client.chat.completions.create({
model: 'gpt-4o',
messages,
tools: getTools(),
tool_choice: 'auto',
});
const choice = response.choices[0];
messages.push(choice.message);
// If no tool calls, we're done
if (!choice.message.tool_calls || choice.message.tool_calls.length === 0) {
return choice.message.content;
}
// Execute each tool call
for (const toolCall of choice.message.tool_calls) {
const args = JSON.parse(toolCall.function.arguments);
const result = await executeTool(toolCall.function.name, args);
messages.push({
role: 'tool',
tool_call_id: toolCall.id,
content: JSON.stringify(result),
});
}
}
}
Step 3: Define Tools (Function Calling)
OpenAI's function calling uses JSON Schema to describe tools. Be precise — the descriptions directly affect when GPT chooses to use each tool:
function getTools() {
return [
{
type: 'function',
function: {
name: 'web_search',
description: 'Search the web for current information. Returns titles, URLs, and snippets.',
parameters: {
type: 'object',
properties: {
query: {
type: 'string',
description: 'The search query. Be specific for better results.',
},
num_results: {
type: 'number',
description: 'Number of results to return (1-10)',
default: 5,
},
},
required: ['query'],
},
},
},
{
type: 'function',
function: {
name: 'read_file',
description: 'Read a file from the workspace. Supports text, JSON, CSV, and markdown.',
parameters: {
type: 'object',
properties: {
path: { type: 'string', description: 'File path relative to workspace root' },
},
required: ['path'],
},
},
},
{
type: 'function',
function: {
name: 'write_file',
description: 'Write content to a file. Creates parent directories if needed.',
parameters: {
type: 'object',
properties: {
path: { type: 'string', description: 'File path' },
content: { type: 'string', description: 'File content' },
},
required: ['path', 'content'],
},
},
},
{
type: 'function',
function: {
name: 'execute_code',
description: 'Execute Python code for data analysis, calculations, or file processing.',
parameters: {
type: 'object',
properties: {
code: { type: 'string', description: 'Python code to execute' },
},
required: ['code'],
},
},
},
];
}
async function executeTool(name, args) {
switch (name) {
case 'web_search':
return await searchWeb(args.query, args.num_results);
case 'read_file':
return { content: fs.readFileSync(args.path, 'utf8') };
case 'write_file':
fs.writeFileSync(args.path, args.content);
return { success: true, path: args.path };
case 'execute_code':
return await runPython(args.code);
default:
return { error: `Unknown tool: ${name}` };
}
}
Step 4: Add Memory
GPT-4 forgets everything between API calls. You need to build persistence:
class AgentMemory {
constructor(dir = './memory') {
this.dir = dir;
if (!fs.existsSync(dir)) fs.mkdirSync(dir, { recursive: true });
}
// Daily log — append-only
log(entry) {
const date = new Date().toISOString().split('T')[0];
const time = new Date().toTimeString().split(' ')[0];
const file = `${this.dir}/${date}.md`;
fs.appendFileSync(file, `\n- [${time}] ${entry}`);
}
// Load recent context for the system prompt
getContext(days = 2) {
let context = '';
for (let i = days - 1; i >= 0; i--) {
const date = new Date();
date.setDate(date.getDate() - i);
const file = `${this.dir}/${date.toISOString().split('T')[0]}.md`;
if (fs.existsSync(file)) {
context += `\n## ${date.toISOString().split('T')[0]}\n`;
context += fs.readFileSync(file, 'utf8');
}
}
return context;
}
// Long-term memory — important facts only
remember(fact) {
fs.appendFileSync(`${this.dir}/MEMORY.md`, `\n- ${fact}`);
}
getLongTermMemory() {
const file = `${this.dir}/MEMORY.md`;
return fs.existsSync(file) ? fs.readFileSync(file, 'utf8') : '';
}
}
// Inject memory into every agent call
const memory = new AgentMemory();
function buildSystemPrompt() {
const recentContext = memory.getContext(2);
const longTerm = memory.getLongTermMemory();
return `${soul}\n\n${agents}\n\n${user}
\n## Recent Activity\n${recentContext}
\n## Long-term Memory\n${longTerm}`;
}
Path 2: The Assistants API
If you want built-in thread management, file search, and code interpretation, the Assistants API handles the plumbing:
import OpenAI from 'openai';
const client = new OpenAI();
// Create an assistant (do this once)
async function createAssistant() {
return await client.beta.assistants.create({
name: 'Research Agent',
instructions: systemPrompt,
model: 'gpt-4o',
tools: [
{ type: 'file_search' }, // Built-in RAG
{ type: 'code_interpreter' }, // Built-in Python execution
{
type: 'function',
function: {
name: 'web_search',
description: 'Search the web for current information',
parameters: {
type: 'object',
properties: {
query: { type: 'string' },
},
required: ['query'],
},
},
},
],
});
}
// Run a conversation
async function chat(assistantId, userMessage) {
// Create a thread (or reuse one for ongoing conversations)
const thread = await client.beta.threads.create();
// Add the user message
await client.beta.threads.messages.create(thread.id, {
role: 'user',
content: userMessage,
});
// Run the assistant
let run = await client.beta.threads.runs.create(thread.id, {
assistant_id: assistantId,
});
// Poll for completion (or use streaming)
while (['queued', 'in_progress', 'requires_action'].includes(run.status)) {
if (run.status === 'requires_action') {
// Handle function calls
const toolOutputs = [];
for (const call of run.required_action.submit_tool_outputs.tool_calls) {
const args = JSON.parse(call.function.arguments);
const result = await executeTool(call.function.name, args);
toolOutputs.push({
tool_call_id: call.id,
output: JSON.stringify(result),
});
}
run = await client.beta.threads.runs.submitToolOutputs(
thread.id, run.id, { tool_outputs: toolOutputs }
);
} else {
await new Promise(r => setTimeout(r, 1000));
run = await client.beta.threads.runs.retrieve(thread.id, run.id);
}
}
// Get the response
const messages = await client.beta.threads.messages.list(thread.id);
return messages.data[0].content[0].text.value;
}
Structured Outputs: Reliable Tool Calls
One of GPT-4's strengths for agents is structured outputs — guaranteed JSON schema compliance:
const response = await client.chat.completions.create({
model: 'gpt-4o',
messages,
response_format: {
type: 'json_schema',
json_schema: {
name: 'task_plan',
schema: {
type: 'object',
properties: {
tasks: {
type: 'array',
items: {
type: 'object',
properties: {
id: { type: 'number' },
action: { type: 'string' },
tool: { type: 'string' },
priority: { type: 'string', enum: ['high', 'medium', 'low'] },
estimated_minutes: { type: 'number' },
},
required: ['id', 'action', 'tool', 'priority'],
},
},
reasoning: { type: 'string' },
},
required: ['tasks', 'reasoning'],
},
},
},
});
// Guaranteed valid JSON matching the schema
const plan = JSON.parse(response.choices[0].message.content);
This is extremely useful for agent planning — you can force GPT to output a structured execution plan before it starts working.
Advanced: The Reasoning Models (o1/o3)
For complex multi-step reasoning, OpenAI's o1 and o3 models think before they respond. Use them for planning, not execution:
// Use o3-mini for planning, GPT-4o for execution
async function planAndExecute(task) {
// Step 1: Plan with reasoning model
const plan = await client.chat.completions.create({
model: 'o3-mini',
messages: [
{ role: 'user', content: `Break this task into concrete steps with specific tool calls:\n\n${task}` },
],
response_format: { type: 'json_object' },
});
const steps = JSON.parse(plan.choices[0].message.content);
// Step 2: Execute each step with GPT-4o (cheaper, faster)
for (const step of steps.steps) {
await agentLoop(`Execute this step: ${JSON.stringify(step)}`);
}
}
Making It Autonomous
Same pattern as any agent — cron-based autonomy:
import cron from 'node-cron';
const memory = new AgentMemory();
// Hourly autonomous run
cron.schedule('0 * * * *', async () => {
const systemPrompt = buildSystemPrompt();
const tasks = fs.readFileSync('./tasks.md', 'utf8');
try {
const result = await agentLoop(`
Autonomous run. Time: ${new Date().toISOString()}
## Pending Tasks
${tasks}
Pick the highest-priority task and execute it.
Log everything you do. Update the task list when done.
`);
memory.log(`Autonomous run: ${result.substring(0, 200)}`);
} catch (error) {
memory.log(`ERROR in autonomous run: ${error.message}`);
// Alert human if critical
if (error.message.includes('rate_limit') || error.message.includes('quota')) {
await sendAlert('Agent hit rate limit — needs attention');
}
}
});
OpenAI-Specific Gotchas
- Rate limits are aggressive. GPT-4 has strict TPM (tokens per minute) limits. Build exponential backoff into every API call. Don't learn this the hard way at 3am.
- Function call arguments aren't always valid JSON. Despite improvements, GPT-4 occasionally generates malformed function arguments. Always wrap
JSON.parse()in try/catch. - The Assistants API charges for storage. Threads, files, and vector stores all cost money. Clean up old threads regularly or your bill will surprise you.
- Parallel function calls can be inconsistent. When GPT calls multiple tools simultaneously, the order of execution matters. If Tool B depends on Tool A's result, you need to handle this in your executor.
- System messages get diluted in long conversations. GPT-4 pays less attention to the system prompt as conversations grow longer. Reinforce critical instructions periodically by injecting reminders.
- Token counting is non-trivial. Use tiktoken to count tokens before sending. Exceeding the context window silently truncates your messages — you won't get an error, just worse results.
Error Handling for Production
import OpenAI from 'openai';
const client = new OpenAI();
async function resilientCall(messages, maxRetries = 3) {
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
return await client.chat.completions.create({
model: 'gpt-4o',
messages,
tools: getTools(),
});
} catch (error) {
if (error.status === 429) {
// Rate limited — exponential backoff
const wait = Math.pow(2, attempt) * 1000;
console.log(`Rate limited. Waiting ${wait}ms (attempt ${attempt}/${maxRetries})`);
await new Promise(r => setTimeout(r, wait));
} else if (error.status === 500 || error.status === 503) {
// Server error — retry with backoff
await new Promise(r => setTimeout(r, attempt * 2000));
} else {
throw error; // Don't retry client errors
}
}
}
throw new Error('Max retries exceeded');
}
// Token counting to prevent context overflow
import { encoding_for_model } from 'tiktoken';
function countTokens(messages) {
const enc = encoding_for_model('gpt-4o');
let total = 0;
for (const msg of messages) {
total += enc.encode(typeof msg.content === 'string' ? msg.content : JSON.stringify(msg.content)).length;
total += 4; // per-message overhead
}
return total;
}
// Trim oldest messages if context gets too long
function trimMessages(messages, maxTokens = 100000) {
while (countTokens(messages) > maxTokens && messages.length > 2) {
// Keep system message and last user message, trim from middle
messages.splice(1, 1);
}
return messages;
}
Cost Breakdown: What to Expect
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Best use in agents |
|---|---|---|---|
| GPT-4o | $2.50 | $10.00 | Main agent model — good balance |
| GPT-4o-mini | $0.15 | $0.60 | Simple tasks, high volume, classification |
| o3-mini | $1.10 | $4.40 | Planning, complex reasoning |
| o1 | $15.00 | $60.00 | Critical decisions only |
Typical monthly costs for a production agent running hourly with GPT-4o:
- Light usage (short conversations): $30-80/month
- Medium usage (research + tool calls): $100-300/month
- Heavy usage (long context, many tools): $300-800/month
OpenAI vs Claude for Agents: Honest Take
Having built agents on both platforms, here's the real difference:
- GPT-4o wins on: ecosystem (more integrations, bigger community), structured outputs (guaranteed schema), cost (cheaper for standard tasks), multimodal capabilities
- Claude wins on: context window (200K vs 128K), system prompt adherence, extended thinking, MCP protocol, long document analysis
- Both are good at: tool calling, code generation, reasoning, instruction following
For most agents, either works. Pick based on your specific needs, not hype. If you need large context or strong personality adherence, go Claude. If you need structured outputs and a bigger ecosystem, go OpenAI.
Build Your Agent's Personality
Whether you use OpenAI or Claude, every agent needs a SOUL.md. Use our free generator to create one in 5 minutes.
Generate Your SOUL.mdDeployment Checklist
- ☐ API key stored in environment variables (never in code)
- ☐ Rate limiting with exponential backoff implemented
- ☐ Token counting to prevent context overflow
- ☐ Error handling for all OpenAI API error codes
- ☐ Memory system persists across process restarts
- ☐ Logging captures every API call, tool execution, and error
- ☐ Cost monitoring with daily/weekly budget alerts
- ☐ Graceful degradation (fallback to mini when main model is down)
- ☐ Kill switch for autonomous runs
- ☐ Human review workflow for high-stakes actions
Go deeper with the AI Employee Playbook
The complete system: 3-file framework, memory architecture, autonomy levels, and 15 production templates.
Get the Playbook — €29