AI Agent Memory: How to Give Your AI Agent Long-Term Memory
Your AI agent is brilliant for exactly one conversation — then forgets everything. Every session starts from zero. No context about the user, no memory of past decisions, no learning from mistakes.
This is the single biggest gap between demo agents and production agents. Memory is what makes an AI agent actually useful over time.
In this guide, you'll build five progressively powerful memory systems — from a simple conversation buffer to a self-organizing knowledge graph. Each one comes with production Python code you can copy directly into your project.
📑 What You'll Learn
- Why AI Agents Need Memory
- The 4-Layer Memory Architecture
- Layer 1: Conversation Buffer
- Layer 2: Fact Store (Entity Memory)
- Layer 3: Semantic Memory (Vector Search)
- Layer 4: Episodic Memory (Experience Replay)
- Advanced: Self-Organizing Memory
- Memory Consolidation & Forgetting
- Tools & Vector DBs Compared
- Production Patterns
- 7 Common Memory Mistakes
- 60-Minute Quickstart
Why AI Agents Need Memory
Without memory, every agent interaction is stateless. The user says "use the same format as last time" and the agent has no idea what "last time" means.
Here's what memory unlocks:
| Capability | Without Memory | With Memory |
|---|---|---|
| Personalization | Generic responses every time | Adapts to user preferences |
| Context | Repeats questions | Remembers past conversations |
| Learning | Makes same mistakes | Improves from feedback |
| Relationships | Treats users as strangers | Builds ongoing rapport |
| Task continuity | Can't resume work | Picks up where it left off |
The 4-Layer Memory Architecture
Don't build one giant memory blob. Production agents use layered memory, just like the human brain:
┌─────────────────────────────────┐
│ Layer 4: Episodic Memory │ ← Past experiences & outcomes
│ (what happened, what worked) │
├─────────────────────────────────┤
│ Layer 3: Semantic Memory │ ← Searchable knowledge
│ (vector embeddings + search) │
├─────────────────────────────────┤
│ Layer 2: Fact Store │ ← Structured entities
│ (user prefs, entities, facts) │
├─────────────────────────────────┤
│ Layer 1: Conversation Buffer │ ← Recent context
│ (last N messages, sliding) │
└─────────────────────────────────┘
Each layer serves a different purpose. You don't need all four from day one — start with Layer 1 and add layers as your agent matures.
Layer 1: Conversation Buffer
The simplest and most essential memory. Keep the last N messages in context so the agent can reference what was just said.
# memory/conversation.py
from dataclasses import dataclass, field
from datetime import datetime
import json
from pathlib import Path
@dataclass
class Message:
role: str
content: str
timestamp: str = field(default_factory=lambda: datetime.now().isoformat())
class ConversationBuffer:
"""Sliding window conversation memory."""
def __init__(self, max_messages: int = 50, persist_path: str = None):
self.max_messages = max_messages
self.messages: list[Message] = []
self.persist_path = Path(persist_path) if persist_path else None
if self.persist_path and self.persist_path.exists():
self._load()
def add(self, role: str, content: str):
self.messages.append(Message(role=role, content=content))
# Sliding window: drop oldest when full
if len(self.messages) > self.max_messages:
self.messages = self.messages[-self.max_messages:]
if self.persist_path:
self._save()
def get_context(self, last_n: int = None) -> list[dict]:
"""Return messages formatted for LLM context."""
msgs = self.messages[-(last_n or self.max_messages):]
return [{"role": m.role, "content": m.content} for m in msgs]
def get_summary_prompt(self) -> str:
"""Generate a summarization prompt for old messages."""
old = self.messages[:len(self.messages)//2]
text = "\n".join(f"{m.role}: {m.content}" for m in old)
return f"Summarize this conversation concisely:\n{text}"
def _save(self):
self.persist_path.parent.mkdir(parents=True, exist_ok=True)
data = [{"role": m.role, "content": m.content,
"timestamp": m.timestamp} for m in self.messages]
self.persist_path.write_text(json.dumps(data, indent=2))
def _load(self):
data = json.loads(self.persist_path.read_text())
self.messages = [Message(**m) for m in data]
Smart Summarization
A 50-message buffer works for short interactions. For longer sessions, summarize old messages before dropping them:
class SummarizingBuffer(ConversationBuffer):
"""Summarizes old messages instead of dropping them."""
def __init__(self, llm_client, **kwargs):
super().__init__(**kwargs)
self.llm = llm_client
self.summary: str = ""
async def compress(self):
"""Summarize first half, keep second half."""
if len(self.messages) < self.max_messages:
return
prompt = self.get_summary_prompt()
response = await self.llm.generate(prompt)
self.summary = response.text
# Keep only recent messages
self.messages = self.messages[len(self.messages)//2:]
def get_context(self, last_n=None):
msgs = super().get_context(last_n)
if self.summary:
msgs.insert(0, {
"role": "system",
"content": f"Previous conversation summary: {self.summary}"
})
return msgs
Layer 2: Fact Store (Entity Memory)
The conversation buffer captures flow. The fact store captures knowledge — structured facts about users, preferences, and entities that the agent extracts from conversations.
# memory/facts.py
import json
from datetime import datetime
from pathlib import Path
class FactStore:
"""Structured entity and fact memory."""
def __init__(self, persist_path: str = "data/facts.json"):
self.path = Path(persist_path)
self.facts: dict = self._load()
def set(self, entity: str, key: str, value: str, source: str = "conversation"):
"""Store a fact about an entity."""
if entity not in self.facts:
self.facts[entity] = {}
self.facts[entity][key] = {
"value": value,
"source": source,
"updated": datetime.now().isoformat(),
"confidence": 1.0
}
self._save()
def get(self, entity: str, key: str = None) -> dict | None:
"""Retrieve facts about an entity."""
if entity not in self.facts:
return None
if key:
return self.facts[entity].get(key)
return self.facts[entity]
def get_user_profile(self, user_id: str) -> str:
"""Format user facts for LLM context injection."""
facts = self.get(f"user:{user_id}")
if not facts:
return "No known preferences."
lines = [f"- {k}: {v['value']}" for k, v in facts.items()]
return "Known user preferences:\n" + "\n".join(lines)
def search(self, query: str) -> list[tuple[str, dict]]:
"""Simple keyword search across all facts."""
results = []
q = query.lower()
for entity, facts in self.facts.items():
for key, data in facts.items():
if q in key.lower() or q in str(data["value"]).lower():
results.append((f"{entity}.{key}", data))
return results
def _load(self) -> dict:
if self.path.exists():
return json.loads(self.path.read_text())
return {}
def _save(self):
self.path.parent.mkdir(parents=True, exist_ok=True)
self.path.write_text(json.dumps(self.facts, indent=2))
Auto-Extract Facts from Conversation
The real power comes from automatically extracting facts during conversation — no user action required:
EXTRACTION_PROMPT = """Analyze this message and extract any facts worth remembering.
Message: {message}
Existing facts about this user: {existing_facts}
Extract structured facts in this JSON format:
[
{{"entity": "user:123", "key": "preferred_language", "value": "Python"}},
{{"entity": "user:123", "key": "company", "value": "Acme Corp"}}
]
Rules:
- Only extract clear, stated facts (not assumptions)
- Update existing facts if new info contradicts them
- Ignore small talk and filler
- Return [] if nothing worth remembering
JSON:"""
async def auto_extract_facts(llm, message: str, user_id: str,
fact_store: FactStore):
existing = fact_store.get_user_profile(user_id)
prompt = EXTRACTION_PROMPT.format(
message=message, existing_facts=existing
)
response = await llm.generate(prompt)
try:
facts = json.loads(response.text)
for f in facts:
fact_store.set(f["entity"], f["key"], f["value"])
except json.JSONDecodeError:
pass # Extraction failed, skip silently
asyncio.create_task(). The user doesn't need to wait for memory writes.
Layer 3: Semantic Memory (Vector Search)
Keyword search breaks down when users ask "what was that thing about deployment?" The fact store can't find it because the word "deployment" wasn't in the stored fact.
Semantic memory uses embeddings to find relevant memories by meaning, not exact keywords.
# memory/semantic.py
import numpy as np
from datetime import datetime
class SemanticMemory:
"""Vector-based memory with semantic search."""
def __init__(self, embed_fn, persist_path: str = "data/memories.json"):
self.embed = embed_fn # async fn(text) -> list[float]
self.memories: list[dict] = []
self.vectors: list[list[float]] = []
self.persist_path = persist_path
self._load()
async def store(self, content: str, metadata: dict = None):
"""Store a memory with its embedding."""
vector = await self.embed(content)
memory = {
"content": content,
"metadata": metadata or {},
"timestamp": datetime.now().isoformat(),
"access_count": 0
}
self.memories.append(memory)
self.vectors.append(vector)
self._save()
async def search(self, query: str, top_k: int = 5,
min_score: float = 0.7) -> list[dict]:
"""Find semantically similar memories."""
if not self.memories:
return []
query_vec = await self.embed(query)
scores = [
self._cosine_sim(query_vec, vec)
for vec in self.vectors
]
# Rank by relevance
ranked = sorted(
enumerate(scores), key=lambda x: x[1], reverse=True
)
results = []
for idx, score in ranked[:top_k]:
if score >= min_score:
mem = self.memories[idx].copy()
mem["relevance_score"] = round(score, 3)
mem["access_count"] += 1
self.memories[idx]["access_count"] += 1
results.append(mem)
return results
def get_context_string(self, results: list[dict]) -> str:
"""Format search results for LLM injection."""
if not results:
return ""
lines = []
for r in results:
lines.append(f"[{r['relevance_score']:.0%} relevant] {r['content']}")
return "Relevant memories:\n" + "\n".join(lines)
@staticmethod
def _cosine_sim(a, b):
a, b = np.array(a), np.array(b)
return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))
def _save(self):
import json
from pathlib import Path
Path(self.persist_path).parent.mkdir(parents=True, exist_ok=True)
data = {"memories": self.memories, "vectors": self.vectors}
Path(self.persist_path).write_text(json.dumps(data))
def _load(self):
import json
from pathlib import Path
p = Path(self.persist_path)
if p.exists():
data = json.loads(p.read_text())
self.memories = data.get("memories", [])
self.vectors = data.get("vectors", [])
Embedding Functions
You need an embedding function to convert text to vectors. Here are the top options:
# Option 1: OpenAI (best quality, $0.13/1M tokens)
from openai import AsyncOpenAI
client = AsyncOpenAI()
async def embed_openai(text: str) -> list[float]:
response = await client.embeddings.create(
model="text-embedding-3-small",
input=text
)
return response.data[0].embedding
# Option 2: Voyage AI (best for code/technical, $0.12/1M tokens)
import voyageai
vc = voyageai.AsyncClient()
async def embed_voyage(text: str) -> list[float]:
result = await vc.embed([text], model="voyage-3-lite")
return result.embeddings[0]
# Option 3: Local (free, no API calls, ~90% quality)
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
async def embed_local(text: str) -> list[float]:
return model.encode(text).tolist()
| Provider | Model | Dimensions | Cost | Quality |
|---|---|---|---|---|
| OpenAI | text-embedding-3-small | 1536 | $0.02/1M tok | ⭐⭐⭐⭐ |
| OpenAI | text-embedding-3-large | 3072 | $0.13/1M tok | ⭐⭐⭐⭐⭐ |
| Voyage | voyage-3 | 1024 | $0.06/1M tok | ⭐⭐⭐⭐⭐ |
| Cohere | embed-v4.0 | 1024 | $0.10/1M tok | ⭐⭐⭐⭐ |
| Local | all-MiniLM-L6-v2 | 384 | Free | ⭐⭐⭐ |
| Local | nomic-embed-text | 768 | Free | ⭐⭐⭐⭐ |
Layer 4: Episodic Memory (Experience Replay)
Semantic memory stores what the agent knows. Episodic memory stores what happened — complete experiences with context, actions, and outcomes.
This is how agents learn from their own history.
# memory/episodic.py
from dataclasses import dataclass
from datetime import datetime
import json
from pathlib import Path
@dataclass
class Episode:
id: str
situation: str # What was the context?
action: str # What did the agent do?
outcome: str # What happened?
success: bool # Did it work?
lesson: str # What to remember
timestamp: str = ""
tags: list[str] = None
def __post_init__(self):
if not self.timestamp:
self.timestamp = datetime.now().isoformat()
if self.tags is None:
self.tags = []
class EpisodicMemory:
"""Learn from past experiences."""
def __init__(self, persist_path: str = "data/episodes.json"):
self.path = Path(persist_path)
self.episodes: list[Episode] = self._load()
def record(self, situation: str, action: str, outcome: str,
success: bool, lesson: str, tags: list[str] = None):
"""Record an experience."""
ep = Episode(
id=f"ep_{len(self.episodes)}",
situation=situation,
action=action,
outcome=outcome,
success=success,
lesson=lesson,
tags=tags or []
)
self.episodes.append(ep)
self._save()
return ep
def recall_similar(self, situation: str, top_k: int = 3) -> list[Episode]:
"""Find relevant past experiences (keyword-based)."""
words = set(situation.lower().split())
scored = []
for ep in self.episodes:
ep_words = set(ep.situation.lower().split())
overlap = len(words & ep_words) / max(len(words), 1)
scored.append((overlap, ep))
scored.sort(key=lambda x: x[0], reverse=True)
return [ep for _, ep in scored[:top_k] if _ > 0]
def get_lessons(self, tags: list[str] = None) -> str:
"""Get accumulated lessons, optionally filtered by tags."""
episodes = self.episodes
if tags:
episodes = [e for e in episodes
if any(t in e.tags for t in tags)]
successes = [e for e in episodes if e.success]
failures = [e for e in episodes if not e.success]
lines = []
if failures:
lines.append("Lessons from past mistakes:")
for e in failures[-5:]:
lines.append(f" ❌ {e.lesson}")
if successes:
lines.append("What worked well:")
for e in successes[-5:]:
lines.append(f" ✅ {e.lesson}")
return "\n".join(lines)
def _save(self):
self.path.parent.mkdir(parents=True, exist_ok=True)
data = [vars(e) for e in self.episodes]
self.path.write_text(json.dumps(data, indent=2))
def _load(self) -> list[Episode]:
if self.path.exists():
data = json.loads(self.path.read_text())
return [Episode(**e) for e in data]
return []
Advanced: Self-Organizing Memory
The most sophisticated pattern: memory that organizes itself. Instead of flat lists, the agent builds a knowledge graph that evolves over time.
# memory/knowledge_graph.py
from datetime import datetime
import json
from pathlib import Path
class KnowledgeGraph:
"""Self-organizing memory as a knowledge graph."""
def __init__(self, persist_path: str = "data/graph.json"):
self.path = Path(persist_path)
self.nodes: dict[str, dict] = {}
self.edges: list[dict] = []
self._load()
def add_node(self, id: str, type: str, properties: dict):
"""Add or update a knowledge node."""
if id in self.nodes:
self.nodes[id]["properties"].update(properties)
self.nodes[id]["updated"] = datetime.now().isoformat()
self.nodes[id]["access_count"] = \
self.nodes[id].get("access_count", 0) + 1
else:
self.nodes[id] = {
"type": type,
"properties": properties,
"created": datetime.now().isoformat(),
"updated": datetime.now().isoformat(),
"access_count": 1
}
self._save()
def add_edge(self, source: str, target: str, relation: str,
weight: float = 1.0):
"""Connect two nodes with a typed relationship."""
# Update existing edge or create new
for edge in self.edges:
if (edge["source"] == source and
edge["target"] == target and
edge["relation"] == relation):
edge["weight"] = min(edge["weight"] + 0.1, 2.0)
edge["updated"] = datetime.now().isoformat()
self._save()
return
self.edges.append({
"source": source,
"target": target,
"relation": relation,
"weight": weight,
"created": datetime.now().isoformat(),
"updated": datetime.now().isoformat()
})
self._save()
def query(self, node_id: str, depth: int = 1) -> dict:
"""Get a node and its connected neighborhood."""
if node_id not in self.nodes:
return {}
result = {"node": self.nodes[node_id], "connections": []}
visited = {node_id}
def traverse(nid, d):
if d <= 0:
return
for edge in self.edges:
neighbor = None
if edge["source"] == nid:
neighbor = edge["target"]
elif edge["target"] == nid:
neighbor = edge["source"]
if neighbor and neighbor not in visited:
visited.add(neighbor)
result["connections"].append({
"node_id": neighbor,
"node": self.nodes.get(neighbor, {}),
"relation": edge["relation"],
"weight": edge["weight"]
})
traverse(neighbor, d - 1)
traverse(node_id, depth)
return result
def to_context_string(self, node_id: str) -> str:
"""Format graph neighborhood for LLM context."""
data = self.query(node_id, depth=2)
if not data:
return ""
lines = [f"Knowledge about {node_id}:"]
node = data["node"]
for k, v in node["properties"].items():
lines.append(f" {k}: {v}")
for conn in data["connections"]:
lines.append(
f" → {conn['relation']} → {conn['node_id']}"
)
return "\n".join(lines)
def _save(self):
self.path.parent.mkdir(parents=True, exist_ok=True)
self.path.write_text(json.dumps(
{"nodes": self.nodes, "edges": self.edges}, indent=2
))
def _load(self):
if self.path.exists():
data = json.loads(self.path.read_text())
self.nodes = data.get("nodes", {})
self.edges = data.get("edges", {})
Memory Consolidation & Forgetting
Real memory systems don't grow forever. They need consolidation (compressing old memories) and forgetting (dropping irrelevant ones).
class MemoryConsolidator:
"""Periodically consolidate and prune memories."""
def __init__(self, semantic_memory: SemanticMemory,
fact_store: FactStore,
llm_client):
self.semantic = semantic_memory
self.facts = fact_store
self.llm = llm_client
async def consolidate(self, max_memories: int = 500):
"""Merge similar memories and drop stale ones."""
if len(self.semantic.memories) <= max_memories:
return
# 1. Find and merge duplicates
to_remove = set()
for i, mem_a in enumerate(self.semantic.memories):
if i in to_remove:
continue
for j in range(i + 1, len(self.semantic.memories)):
if j in to_remove:
continue
sim = SemanticMemory._cosine_sim(
self.semantic.vectors[i],
self.semantic.vectors[j]
)
if sim > 0.92:
# Merge: keep the newer one
to_remove.add(i)
break
# 2. Score remaining by recency + access frequency
scored = []
now = datetime.now()
for i, mem in enumerate(self.semantic.memories):
if i in to_remove:
continue
age_days = (now - datetime.fromisoformat(
mem["timestamp"])).days
access = mem.get("access_count", 0)
# Decay: old + unused = low score
score = access / (1 + age_days * 0.1)
scored.append((i, score))
# 3. Keep top memories, summarize the rest
scored.sort(key=lambda x: x[1], reverse=True)
keep_indices = {idx for idx, _ in scored[:max_memories]}
# Summarize dropped memories into a single entry
dropped = [self.semantic.memories[idx]
for idx, _ in scored[max_memories:]]
if dropped:
texts = [m["content"] for m in dropped[:20]]
summary = await self.llm.generate(
f"Summarize these memories into key facts:\n" +
"\n".join(texts)
)
await self.semantic.store(
summary.text,
{"type": "consolidated", "source_count": len(dropped)}
)
# 4. Rebuild memory store
new_memories = [self.semantic.memories[i] for i in sorted(keep_indices)]
new_vectors = [self.semantic.vectors[i] for i in sorted(keep_indices)]
self.semantic.memories = new_memories
self.semantic.vectors = new_vectors
self.semantic._save()
Tools & Vector DBs Compared
| Tool | Type | Best For | Pricing |
|---|---|---|---|
| ChromaDB | Vector DB (local) | Prototyping, small datasets | Free / open source |
| Pinecone | Vector DB (cloud) | Production scale, serverless | Free tier → $70/mo |
| Qdrant | Vector DB (hybrid) | Self-hosted + cloud option | Free / from $25/mo |
| Supabase pgvector | Postgres extension | Full-stack apps (DB + vectors) | Free tier → $25/mo |
| Weaviate | Vector DB (cloud) | Hybrid search, multi-modal | Free tier → $25/mo |
| Turbopuffer | Vector DB (serverless) | Massive scale, low latency | Pay per query |
| Redis Stack | Vector + cache | Low latency, existing Redis users | Free / from $7/mo |
| Mem0 | Memory layer | Drop-in agent memory SDK | Free tier → $99/mo |
| Zep | Memory service | Conversation memory as a service | Open source / cloud |
| LangMem | Memory toolkit | LangChain ecosystem | Free / open source |
Our Recommendation
Production build: Mem0 or Zep for managed memory → Pinecone or Qdrant for custom vector search → Redis for caching hot memories.
Production Patterns
Pattern 1: The Memory Pipeline
Wire all four layers together into a single memory manager:
class AgentMemory:
"""Unified memory manager combining all layers."""
def __init__(self, user_id: str, embed_fn):
base = f"data/users/{user_id}"
self.conversation = SummarizingBuffer(
max_messages=50,
persist_path=f"{base}/conversation.json"
)
self.facts = FactStore(
persist_path=f"{base}/facts.json"
)
self.semantic = SemanticMemory(
embed_fn=embed_fn,
persist_path=f"{base}/semantic.json"
)
self.episodic = EpisodicMemory(
persist_path=f"{base}/episodes.json"
)
async def build_context(self, current_message: str) -> str:
"""Build complete memory context for LLM."""
parts = []
# 1. User profile from facts
profile = self.facts.get_user_profile(self.user_id)
if profile:
parts.append(profile)
# 2. Relevant semantic memories
results = await self.semantic.search(current_message, top_k=3)
context = self.semantic.get_context_string(results)
if context:
parts.append(context)
# 3. Relevant lessons
lessons = self.episodic.get_lessons()
if lessons:
parts.append(lessons)
return "\n\n".join(parts)
async def after_response(self, user_msg: str, agent_msg: str):
"""Post-response memory operations (fire-and-forget)."""
self.conversation.add("user", user_msg)
self.conversation.add("assistant", agent_msg)
await self.semantic.store(
f"User: {user_msg}\nAgent: {agent_msg}",
{"type": "conversation"}
)
Pattern 2: Memory-Aware System Prompt
SYSTEM_PROMPT = """You are a helpful assistant with persistent memory.
{memory_context}
Instructions:
- Reference what you know about the user naturally
- Don't mention "my memory" or "I remember" explicitly
- If you're unsure about a memory, ask to confirm
- Correct memories when the user corrects you
"""
async def chat(user_message: str, memory: AgentMemory):
context = await memory.build_context(user_message)
messages = [
{"role": "system",
"content": SYSTEM_PROMPT.format(memory_context=context)},
*memory.conversation.get_context(last_n=20),
{"role": "user", "content": user_message}
]
response = await llm.generate(messages)
# Update memory asynchronously
asyncio.create_task(
memory.after_response(user_message, response.text)
)
return response.text
Pattern 3: Privacy-Respecting Memory
SENSITIVE_PATTERNS = [
r'\b\d{3}[-.]?\d{2}[-.]?\d{4}\b', # SSN
r'\b\d{16}\b', # Credit card
r'\b[A-Za-z0-9+/]{40,}\b', # API keys
]
def sanitize_for_memory(text: str) -> str:
"""Strip sensitive data before storing."""
import re
for pattern in SENSITIVE_PATTERNS:
text = re.sub(pattern, '[REDACTED]', text)
return text
# User controls
class MemoryControls:
@staticmethod
async def forget(memory: AgentMemory, what: str):
"""Let users delete specific memories."""
results = await memory.semantic.search(what, top_k=5)
# Remove matching memories
for r in results:
idx = memory.semantic.memories.index(r)
memory.semantic.memories.pop(idx)
memory.semantic.vectors.pop(idx)
memory.semantic._save()
@staticmethod
def export(memory: AgentMemory) -> dict:
"""GDPR: export all stored data."""
return {
"conversation": memory.conversation.get_context(),
"facts": memory.facts.facts,
"episodes": [vars(e) for e in memory.episodic.episodes]
}
7 Common Memory Mistakes
❌ 1. Storing Everything
Problem: Every message goes into memory. Context window fills with noise.
Fix: Use extraction — only store facts and insights, not raw conversations. Less is more.
❌ 2. No Relevance Filtering
Problem: Search returns 20 memories and you inject all of them into the prompt.
Fix: Set a minimum relevance threshold (0.7+) and limit to top 3-5 results.
❌ 3. Blocking on Memory Operations
Problem: User waits 2 seconds while the agent embeds and stores memory.
Fix: All memory writes should be fire-and-forget. Use asyncio.create_task().
❌ 4. Never Forgetting
Problem: Memory grows to 10,000 entries. Search gets slow and noisy.
Fix: Implement consolidation. Run monthly. Drop low-access old memories.
❌ 5. No User Controls
Problem: User says "forget that" and the agent says "sure!" but doesn't actually delete anything.
Fix: Implement real forget/export commands. GDPR requires this in the EU.
❌ 6. Trusting Extracted Facts Blindly
Problem: LLM extracts "user hates Python" from a sarcastic comment. Agent avoids Python forever.
Fix: Add confidence scores. For important facts, confirm with the user before storing.
❌ 7. One Memory System for Everything
Problem: Cramming conversations, facts, and experiences into one flat store.
Fix: Use layered architecture. Each layer has different access patterns and lifecycle.
60-Minute Quickstart
Build a memory-powered agent in one hour:
Minutes 0-15: Setup
pip install anthropic chromadb
mkdir -p agent/{memory,data}
touch agent/{main.py,memory/__init__.py}
Minutes 15-30: Core Memory
# agent/memory/__init__.py
import chromadb
from datetime import datetime
client = chromadb.PersistentClient(path="data/chroma")
collection = client.get_or_create_collection("agent_memory")
def remember(content: str, metadata: dict = None):
"""Store a memory."""
collection.add(
documents=[content],
ids=[f"mem_{datetime.now().timestamp()}"],
metadatas=[metadata or {"type": "general"}]
)
def recall(query: str, n_results: int = 3) -> list[str]:
"""Find relevant memories."""
results = collection.query(
query_texts=[query],
n_results=n_results
)
return results["documents"][0] if results["documents"] else []
def forget(content: str):
"""Remove memories matching content."""
results = collection.query(query_texts=[content], n_results=5)
if results["ids"][0]:
collection.delete(ids=results["ids"][0])
Minutes 30-50: Agent with Memory
# agent/main.py
import anthropic
from memory import remember, recall
client = anthropic.Anthropic()
def chat(user_message: str, history: list) -> str:
# Recall relevant memories
memories = recall(user_message)
memory_context = ""
if memories:
memory_context = "Relevant context from past conversations:\n"
memory_context += "\n".join(f"- {m}" for m in memories)
messages = history + [{"role": "user", "content": user_message}]
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system=f"""You are a helpful assistant with long-term memory.
{memory_context}
After each interaction, naturally reference what you know about the user.
If the user shares a preference or important fact, acknowledge it.""",
messages=messages
)
reply = response.content[0].text
# Store the exchange as memory
remember(
f"User said: {user_message[:200]}. "
f"I responded about: {reply[:200]}",
{"type": "conversation"}
)
return reply
# Run it
if __name__ == "__main__":
history = []
print("Chat with a memory-powered agent (type 'quit' to exit)")
while True:
msg = input("\nYou: ")
if msg.lower() == "quit":
break
reply = chat(msg, history)
history.append({"role": "user", "content": msg})
history.append({"role": "assistant", "content": reply})
print(f"\nAgent: {reply}")
Minutes 50-60: Test
# Session 1
You: I prefer Python over TypeScript
Agent: Got it! I'll keep that in mind...
You: My company is called Acme Corp
Agent: Nice! What does Acme Corp do?
# Session 2 (new session — memory persists!)
You: What programming language should I use for this project?
Agent: Since you prefer Python, I'd suggest starting there...
What's Next?
Memory is the foundation that makes every other agent capability better. Once your agent remembers, you can build:
- Personalized workflows — agent adapts processes to each user
- Self-improving agents — learn from mistakes via episodic memory
- Multi-agent shared memory — agents that collaborate with shared context
- Proactive agents — trigger actions based on remembered patterns
The AI Employee Playbook includes memory system templates, production patterns, and copy-paste code for building agents that actually remember.
Get the Playbook — €29