AI Agent Memory: How to Give Your AI Agent Long-Term Memory

Your AI agent is brilliant for exactly one conversation — then forgets everything. Every session starts from zero. No context about the user, no memory of past decisions, no learning from mistakes.

This is the single biggest gap between demo agents and production agents. Memory is what makes an AI agent actually useful over time.

In this guide, you'll build five progressively powerful memory systems — from a simple conversation buffer to a self-organizing knowledge graph. Each one comes with production Python code you can copy directly into your project.

Why AI Agents Need Memory

Without memory, every agent interaction is stateless. The user says "use the same format as last time" and the agent has no idea what "last time" means.

Here's what memory unlocks:

CapabilityWithout MemoryWith Memory
PersonalizationGeneric responses every timeAdapts to user preferences
ContextRepeats questionsRemembers past conversations
LearningMakes same mistakesImproves from feedback
RelationshipsTreats users as strangersBuilds ongoing rapport
Task continuityCan't resume workPicks up where it left off
The memory gap is real: In production, agents with persistent memory see 40-60% higher user retention compared to stateless agents. Users come back when the agent actually knows them.

The 4-Layer Memory Architecture

Don't build one giant memory blob. Production agents use layered memory, just like the human brain:

┌─────────────────────────────────┐
│   Layer 4: Episodic Memory      │  ← Past experiences & outcomes
│   (what happened, what worked)  │
├─────────────────────────────────┤
│   Layer 3: Semantic Memory      │  ← Searchable knowledge
│   (vector embeddings + search)  │
├─────────────────────────────────┤
│   Layer 2: Fact Store           │  ← Structured entities
│   (user prefs, entities, facts) │
├─────────────────────────────────┤
│   Layer 1: Conversation Buffer  │  ← Recent context
│   (last N messages, sliding)    │
└─────────────────────────────────┘

Each layer serves a different purpose. You don't need all four from day one — start with Layer 1 and add layers as your agent matures.

Layer 1: Conversation Buffer

The simplest and most essential memory. Keep the last N messages in context so the agent can reference what was just said.

# memory/conversation.py
from dataclasses import dataclass, field
from datetime import datetime
import json
from pathlib import Path

@dataclass
class Message:
    role: str
    content: str
    timestamp: str = field(default_factory=lambda: datetime.now().isoformat())

class ConversationBuffer:
    """Sliding window conversation memory."""

    def __init__(self, max_messages: int = 50, persist_path: str = None):
        self.max_messages = max_messages
        self.messages: list[Message] = []
        self.persist_path = Path(persist_path) if persist_path else None
        if self.persist_path and self.persist_path.exists():
            self._load()

    def add(self, role: str, content: str):
        self.messages.append(Message(role=role, content=content))
        # Sliding window: drop oldest when full
        if len(self.messages) > self.max_messages:
            self.messages = self.messages[-self.max_messages:]
        if self.persist_path:
            self._save()

    def get_context(self, last_n: int = None) -> list[dict]:
        """Return messages formatted for LLM context."""
        msgs = self.messages[-(last_n or self.max_messages):]
        return [{"role": m.role, "content": m.content} for m in msgs]

    def get_summary_prompt(self) -> str:
        """Generate a summarization prompt for old messages."""
        old = self.messages[:len(self.messages)//2]
        text = "\n".join(f"{m.role}: {m.content}" for m in old)
        return f"Summarize this conversation concisely:\n{text}"

    def _save(self):
        self.persist_path.parent.mkdir(parents=True, exist_ok=True)
        data = [{"role": m.role, "content": m.content,
                 "timestamp": m.timestamp} for m in self.messages]
        self.persist_path.write_text(json.dumps(data, indent=2))

    def _load(self):
        data = json.loads(self.persist_path.read_text())
        self.messages = [Message(**m) for m in data]

Smart Summarization

A 50-message buffer works for short interactions. For longer sessions, summarize old messages before dropping them:

class SummarizingBuffer(ConversationBuffer):
    """Summarizes old messages instead of dropping them."""

    def __init__(self, llm_client, **kwargs):
        super().__init__(**kwargs)
        self.llm = llm_client
        self.summary: str = ""

    async def compress(self):
        """Summarize first half, keep second half."""
        if len(self.messages) < self.max_messages:
            return

        prompt = self.get_summary_prompt()
        response = await self.llm.generate(prompt)
        self.summary = response.text

        # Keep only recent messages
        self.messages = self.messages[len(self.messages)//2:]

    def get_context(self, last_n=None):
        msgs = super().get_context(last_n)
        if self.summary:
            msgs.insert(0, {
                "role": "system",
                "content": f"Previous conversation summary: {self.summary}"
            })
        return msgs
✅ When to use: Every agent needs this. It's the minimum viable memory. Start here and add layers only when you need them.

Layer 2: Fact Store (Entity Memory)

The conversation buffer captures flow. The fact store captures knowledge — structured facts about users, preferences, and entities that the agent extracts from conversations.

# memory/facts.py
import json
from datetime import datetime
from pathlib import Path

class FactStore:
    """Structured entity and fact memory."""

    def __init__(self, persist_path: str = "data/facts.json"):
        self.path = Path(persist_path)
        self.facts: dict = self._load()

    def set(self, entity: str, key: str, value: str, source: str = "conversation"):
        """Store a fact about an entity."""
        if entity not in self.facts:
            self.facts[entity] = {}
        self.facts[entity][key] = {
            "value": value,
            "source": source,
            "updated": datetime.now().isoformat(),
            "confidence": 1.0
        }
        self._save()

    def get(self, entity: str, key: str = None) -> dict | None:
        """Retrieve facts about an entity."""
        if entity not in self.facts:
            return None
        if key:
            return self.facts[entity].get(key)
        return self.facts[entity]

    def get_user_profile(self, user_id: str) -> str:
        """Format user facts for LLM context injection."""
        facts = self.get(f"user:{user_id}")
        if not facts:
            return "No known preferences."
        lines = [f"- {k}: {v['value']}" for k, v in facts.items()]
        return "Known user preferences:\n" + "\n".join(lines)

    def search(self, query: str) -> list[tuple[str, dict]]:
        """Simple keyword search across all facts."""
        results = []
        q = query.lower()
        for entity, facts in self.facts.items():
            for key, data in facts.items():
                if q in key.lower() or q in str(data["value"]).lower():
                    results.append((f"{entity}.{key}", data))
        return results

    def _load(self) -> dict:
        if self.path.exists():
            return json.loads(self.path.read_text())
        return {}

    def _save(self):
        self.path.parent.mkdir(parents=True, exist_ok=True)
        self.path.write_text(json.dumps(self.facts, indent=2))

Auto-Extract Facts from Conversation

The real power comes from automatically extracting facts during conversation — no user action required:

EXTRACTION_PROMPT = """Analyze this message and extract any facts worth remembering.

Message: {message}
Existing facts about this user: {existing_facts}

Extract structured facts in this JSON format:
[
  {{"entity": "user:123", "key": "preferred_language", "value": "Python"}},
  {{"entity": "user:123", "key": "company", "value": "Acme Corp"}}
]

Rules:
- Only extract clear, stated facts (not assumptions)
- Update existing facts if new info contradicts them
- Ignore small talk and filler
- Return [] if nothing worth remembering

JSON:"""

async def auto_extract_facts(llm, message: str, user_id: str,
                              fact_store: FactStore):
    existing = fact_store.get_user_profile(user_id)
    prompt = EXTRACTION_PROMPT.format(
        message=message, existing_facts=existing
    )
    response = await llm.generate(prompt)
    try:
        facts = json.loads(response.text)
        for f in facts:
            fact_store.set(f["entity"], f["key"], f["value"])
    except json.JSONDecodeError:
        pass  # Extraction failed, skip silently
💡 Pro tip: Run fact extraction asynchronously — don't block the response. Fire-and-forget with asyncio.create_task(). The user doesn't need to wait for memory writes.

Layer 3: Semantic Memory (Vector Search)

Keyword search breaks down when users ask "what was that thing about deployment?" The fact store can't find it because the word "deployment" wasn't in the stored fact.

Semantic memory uses embeddings to find relevant memories by meaning, not exact keywords.

# memory/semantic.py
import numpy as np
from datetime import datetime

class SemanticMemory:
    """Vector-based memory with semantic search."""

    def __init__(self, embed_fn, persist_path: str = "data/memories.json"):
        self.embed = embed_fn  # async fn(text) -> list[float]
        self.memories: list[dict] = []
        self.vectors: list[list[float]] = []
        self.persist_path = persist_path
        self._load()

    async def store(self, content: str, metadata: dict = None):
        """Store a memory with its embedding."""
        vector = await self.embed(content)
        memory = {
            "content": content,
            "metadata": metadata or {},
            "timestamp": datetime.now().isoformat(),
            "access_count": 0
        }
        self.memories.append(memory)
        self.vectors.append(vector)
        self._save()

    async def search(self, query: str, top_k: int = 5,
                     min_score: float = 0.7) -> list[dict]:
        """Find semantically similar memories."""
        if not self.memories:
            return []

        query_vec = await self.embed(query)
        scores = [
            self._cosine_sim(query_vec, vec)
            for vec in self.vectors
        ]

        # Rank by relevance
        ranked = sorted(
            enumerate(scores), key=lambda x: x[1], reverse=True
        )

        results = []
        for idx, score in ranked[:top_k]:
            if score >= min_score:
                mem = self.memories[idx].copy()
                mem["relevance_score"] = round(score, 3)
                mem["access_count"] += 1
                self.memories[idx]["access_count"] += 1
                results.append(mem)

        return results

    def get_context_string(self, results: list[dict]) -> str:
        """Format search results for LLM injection."""
        if not results:
            return ""
        lines = []
        for r in results:
            lines.append(f"[{r['relevance_score']:.0%} relevant] {r['content']}")
        return "Relevant memories:\n" + "\n".join(lines)

    @staticmethod
    def _cosine_sim(a, b):
        a, b = np.array(a), np.array(b)
        return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))

    def _save(self):
        import json
        from pathlib import Path
        Path(self.persist_path).parent.mkdir(parents=True, exist_ok=True)
        data = {"memories": self.memories, "vectors": self.vectors}
        Path(self.persist_path).write_text(json.dumps(data))

    def _load(self):
        import json
        from pathlib import Path
        p = Path(self.persist_path)
        if p.exists():
            data = json.loads(p.read_text())
            self.memories = data.get("memories", [])
            self.vectors = data.get("vectors", [])

Embedding Functions

You need an embedding function to convert text to vectors. Here are the top options:

# Option 1: OpenAI (best quality, $0.13/1M tokens)
from openai import AsyncOpenAI
client = AsyncOpenAI()

async def embed_openai(text: str) -> list[float]:
    response = await client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding

# Option 2: Voyage AI (best for code/technical, $0.12/1M tokens)
import voyageai
vc = voyageai.AsyncClient()

async def embed_voyage(text: str) -> list[float]:
    result = await vc.embed([text], model="voyage-3-lite")
    return result.embeddings[0]

# Option 3: Local (free, no API calls, ~90% quality)
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")

async def embed_local(text: str) -> list[float]:
    return model.encode(text).tolist()
ProviderModelDimensionsCostQuality
OpenAItext-embedding-3-small1536$0.02/1M tok⭐⭐⭐⭐
OpenAItext-embedding-3-large3072$0.13/1M tok⭐⭐⭐⭐⭐
Voyagevoyage-31024$0.06/1M tok⭐⭐⭐⭐⭐
Cohereembed-v4.01024$0.10/1M tok⭐⭐⭐⭐
Localall-MiniLM-L6-v2384Free⭐⭐⭐
Localnomic-embed-text768Free⭐⭐⭐⭐

Layer 4: Episodic Memory (Experience Replay)

Semantic memory stores what the agent knows. Episodic memory stores what happened — complete experiences with context, actions, and outcomes.

This is how agents learn from their own history.

# memory/episodic.py
from dataclasses import dataclass
from datetime import datetime
import json
from pathlib import Path

@dataclass
class Episode:
    id: str
    situation: str       # What was the context?
    action: str          # What did the agent do?
    outcome: str         # What happened?
    success: bool        # Did it work?
    lesson: str          # What to remember
    timestamp: str = ""
    tags: list[str] = None

    def __post_init__(self):
        if not self.timestamp:
            self.timestamp = datetime.now().isoformat()
        if self.tags is None:
            self.tags = []

class EpisodicMemory:
    """Learn from past experiences."""

    def __init__(self, persist_path: str = "data/episodes.json"):
        self.path = Path(persist_path)
        self.episodes: list[Episode] = self._load()

    def record(self, situation: str, action: str, outcome: str,
               success: bool, lesson: str, tags: list[str] = None):
        """Record an experience."""
        ep = Episode(
            id=f"ep_{len(self.episodes)}",
            situation=situation,
            action=action,
            outcome=outcome,
            success=success,
            lesson=lesson,
            tags=tags or []
        )
        self.episodes.append(ep)
        self._save()
        return ep

    def recall_similar(self, situation: str, top_k: int = 3) -> list[Episode]:
        """Find relevant past experiences (keyword-based)."""
        words = set(situation.lower().split())
        scored = []
        for ep in self.episodes:
            ep_words = set(ep.situation.lower().split())
            overlap = len(words & ep_words) / max(len(words), 1)
            scored.append((overlap, ep))
        scored.sort(key=lambda x: x[0], reverse=True)
        return [ep for _, ep in scored[:top_k] if _ > 0]

    def get_lessons(self, tags: list[str] = None) -> str:
        """Get accumulated lessons, optionally filtered by tags."""
        episodes = self.episodes
        if tags:
            episodes = [e for e in episodes
                       if any(t in e.tags for t in tags)]

        successes = [e for e in episodes if e.success]
        failures = [e for e in episodes if not e.success]

        lines = []
        if failures:
            lines.append("Lessons from past mistakes:")
            for e in failures[-5:]:
                lines.append(f"  ❌ {e.lesson}")
        if successes:
            lines.append("What worked well:")
            for e in successes[-5:]:
                lines.append(f"  ✅ {e.lesson}")
        return "\n".join(lines)

    def _save(self):
        self.path.parent.mkdir(parents=True, exist_ok=True)
        data = [vars(e) for e in self.episodes]
        self.path.write_text(json.dumps(data, indent=2))

    def _load(self) -> list[Episode]:
        if self.path.exists():
            data = json.loads(self.path.read_text())
            return [Episode(**e) for e in data]
        return []
When to record episodes: After tool failures, after user corrections, after successful multi-step tasks, and after any unexpected outcomes. Don't record every interaction — only the ones worth learning from.

Advanced: Self-Organizing Memory

The most sophisticated pattern: memory that organizes itself. Instead of flat lists, the agent builds a knowledge graph that evolves over time.

# memory/knowledge_graph.py
from datetime import datetime
import json
from pathlib import Path

class KnowledgeGraph:
    """Self-organizing memory as a knowledge graph."""

    def __init__(self, persist_path: str = "data/graph.json"):
        self.path = Path(persist_path)
        self.nodes: dict[str, dict] = {}
        self.edges: list[dict] = []
        self._load()

    def add_node(self, id: str, type: str, properties: dict):
        """Add or update a knowledge node."""
        if id in self.nodes:
            self.nodes[id]["properties"].update(properties)
            self.nodes[id]["updated"] = datetime.now().isoformat()
            self.nodes[id]["access_count"] = \
                self.nodes[id].get("access_count", 0) + 1
        else:
            self.nodes[id] = {
                "type": type,
                "properties": properties,
                "created": datetime.now().isoformat(),
                "updated": datetime.now().isoformat(),
                "access_count": 1
            }
        self._save()

    def add_edge(self, source: str, target: str, relation: str,
                 weight: float = 1.0):
        """Connect two nodes with a typed relationship."""
        # Update existing edge or create new
        for edge in self.edges:
            if (edge["source"] == source and
                edge["target"] == target and
                edge["relation"] == relation):
                edge["weight"] = min(edge["weight"] + 0.1, 2.0)
                edge["updated"] = datetime.now().isoformat()
                self._save()
                return

        self.edges.append({
            "source": source,
            "target": target,
            "relation": relation,
            "weight": weight,
            "created": datetime.now().isoformat(),
            "updated": datetime.now().isoformat()
        })
        self._save()

    def query(self, node_id: str, depth: int = 1) -> dict:
        """Get a node and its connected neighborhood."""
        if node_id not in self.nodes:
            return {}

        result = {"node": self.nodes[node_id], "connections": []}
        visited = {node_id}

        def traverse(nid, d):
            if d <= 0:
                return
            for edge in self.edges:
                neighbor = None
                if edge["source"] == nid:
                    neighbor = edge["target"]
                elif edge["target"] == nid:
                    neighbor = edge["source"]
                if neighbor and neighbor not in visited:
                    visited.add(neighbor)
                    result["connections"].append({
                        "node_id": neighbor,
                        "node": self.nodes.get(neighbor, {}),
                        "relation": edge["relation"],
                        "weight": edge["weight"]
                    })
                    traverse(neighbor, d - 1)

        traverse(node_id, depth)
        return result

    def to_context_string(self, node_id: str) -> str:
        """Format graph neighborhood for LLM context."""
        data = self.query(node_id, depth=2)
        if not data:
            return ""
        lines = [f"Knowledge about {node_id}:"]
        node = data["node"]
        for k, v in node["properties"].items():
            lines.append(f"  {k}: {v}")
        for conn in data["connections"]:
            lines.append(
                f"  → {conn['relation']} → {conn['node_id']}"
            )
        return "\n".join(lines)

    def _save(self):
        self.path.parent.mkdir(parents=True, exist_ok=True)
        self.path.write_text(json.dumps(
            {"nodes": self.nodes, "edges": self.edges}, indent=2
        ))

    def _load(self):
        if self.path.exists():
            data = json.loads(self.path.read_text())
            self.nodes = data.get("nodes", {})
            self.edges = data.get("edges", {})

Memory Consolidation & Forgetting

Real memory systems don't grow forever. They need consolidation (compressing old memories) and forgetting (dropping irrelevant ones).

class MemoryConsolidator:
    """Periodically consolidate and prune memories."""

    def __init__(self, semantic_memory: SemanticMemory,
                 fact_store: FactStore,
                 llm_client):
        self.semantic = semantic_memory
        self.facts = fact_store
        self.llm = llm_client

    async def consolidate(self, max_memories: int = 500):
        """Merge similar memories and drop stale ones."""
        if len(self.semantic.memories) <= max_memories:
            return

        # 1. Find and merge duplicates
        to_remove = set()
        for i, mem_a in enumerate(self.semantic.memories):
            if i in to_remove:
                continue
            for j in range(i + 1, len(self.semantic.memories)):
                if j in to_remove:
                    continue
                sim = SemanticMemory._cosine_sim(
                    self.semantic.vectors[i],
                    self.semantic.vectors[j]
                )
                if sim > 0.92:
                    # Merge: keep the newer one
                    to_remove.add(i)
                    break

        # 2. Score remaining by recency + access frequency
        scored = []
        now = datetime.now()
        for i, mem in enumerate(self.semantic.memories):
            if i in to_remove:
                continue
            age_days = (now - datetime.fromisoformat(
                mem["timestamp"])).days
            access = mem.get("access_count", 0)
            # Decay: old + unused = low score
            score = access / (1 + age_days * 0.1)
            scored.append((i, score))

        # 3. Keep top memories, summarize the rest
        scored.sort(key=lambda x: x[1], reverse=True)
        keep_indices = {idx for idx, _ in scored[:max_memories]}

        # Summarize dropped memories into a single entry
        dropped = [self.semantic.memories[idx]
                   for idx, _ in scored[max_memories:]]
        if dropped:
            texts = [m["content"] for m in dropped[:20]]
            summary = await self.llm.generate(
                f"Summarize these memories into key facts:\n" +
                "\n".join(texts)
            )
            await self.semantic.store(
                summary.text,
                {"type": "consolidated", "source_count": len(dropped)}
            )

        # 4. Rebuild memory store
        new_memories = [self.semantic.memories[i] for i in sorted(keep_indices)]
        new_vectors = [self.semantic.vectors[i] for i in sorted(keep_indices)]
        self.semantic.memories = new_memories
        self.semantic.vectors = new_vectors
        self.semantic._save()
⚠️ Don't skip forgetting. Agents with unlimited memory get slower over time (more to search) and dumber (irrelevant context dilutes useful context). Budget memory like you budget tokens.

Tools & Vector DBs Compared

ToolTypeBest ForPricing
ChromaDBVector DB (local)Prototyping, small datasetsFree / open source
PineconeVector DB (cloud)Production scale, serverlessFree tier → $70/mo
QdrantVector DB (hybrid)Self-hosted + cloud optionFree / from $25/mo
Supabase pgvectorPostgres extensionFull-stack apps (DB + vectors)Free tier → $25/mo
WeaviateVector DB (cloud)Hybrid search, multi-modalFree tier → $25/mo
TurbopufferVector DB (serverless)Massive scale, low latencyPay per query
Redis StackVector + cacheLow latency, existing Redis usersFree / from $7/mo
Mem0Memory layerDrop-in agent memory SDKFree tier → $99/mo
ZepMemory serviceConversation memory as a serviceOpen source / cloud
LangMemMemory toolkitLangChain ecosystemFree / open source

Our Recommendation

Budget build: JSON files + local embeddings → ChromaDB when you outgrow files → Supabase pgvector when you need a real database too.

Production build: Mem0 or Zep for managed memory → Pinecone or Qdrant for custom vector search → Redis for caching hot memories.

Production Patterns

Pattern 1: The Memory Pipeline

Wire all four layers together into a single memory manager:

class AgentMemory:
    """Unified memory manager combining all layers."""

    def __init__(self, user_id: str, embed_fn):
        base = f"data/users/{user_id}"
        self.conversation = SummarizingBuffer(
            max_messages=50,
            persist_path=f"{base}/conversation.json"
        )
        self.facts = FactStore(
            persist_path=f"{base}/facts.json"
        )
        self.semantic = SemanticMemory(
            embed_fn=embed_fn,
            persist_path=f"{base}/semantic.json"
        )
        self.episodic = EpisodicMemory(
            persist_path=f"{base}/episodes.json"
        )

    async def build_context(self, current_message: str) -> str:
        """Build complete memory context for LLM."""
        parts = []

        # 1. User profile from facts
        profile = self.facts.get_user_profile(self.user_id)
        if profile:
            parts.append(profile)

        # 2. Relevant semantic memories
        results = await self.semantic.search(current_message, top_k=3)
        context = self.semantic.get_context_string(results)
        if context:
            parts.append(context)

        # 3. Relevant lessons
        lessons = self.episodic.get_lessons()
        if lessons:
            parts.append(lessons)

        return "\n\n".join(parts)

    async def after_response(self, user_msg: str, agent_msg: str):
        """Post-response memory operations (fire-and-forget)."""
        self.conversation.add("user", user_msg)
        self.conversation.add("assistant", agent_msg)
        await self.semantic.store(
            f"User: {user_msg}\nAgent: {agent_msg}",
            {"type": "conversation"}
        )

Pattern 2: Memory-Aware System Prompt

SYSTEM_PROMPT = """You are a helpful assistant with persistent memory.

{memory_context}

Instructions:
- Reference what you know about the user naturally
- Don't mention "my memory" or "I remember" explicitly
- If you're unsure about a memory, ask to confirm
- Correct memories when the user corrects you
"""

async def chat(user_message: str, memory: AgentMemory):
    context = await memory.build_context(user_message)
    messages = [
        {"role": "system",
         "content": SYSTEM_PROMPT.format(memory_context=context)},
        *memory.conversation.get_context(last_n=20),
        {"role": "user", "content": user_message}
    ]
    response = await llm.generate(messages)

    # Update memory asynchronously
    asyncio.create_task(
        memory.after_response(user_message, response.text)
    )
    return response.text

Pattern 3: Privacy-Respecting Memory

SENSITIVE_PATTERNS = [
    r'\b\d{3}[-.]?\d{2}[-.]?\d{4}\b',  # SSN
    r'\b\d{16}\b',                       # Credit card
    r'\b[A-Za-z0-9+/]{40,}\b',          # API keys
]

def sanitize_for_memory(text: str) -> str:
    """Strip sensitive data before storing."""
    import re
    for pattern in SENSITIVE_PATTERNS:
        text = re.sub(pattern, '[REDACTED]', text)
    return text

# User controls
class MemoryControls:
    @staticmethod
    async def forget(memory: AgentMemory, what: str):
        """Let users delete specific memories."""
        results = await memory.semantic.search(what, top_k=5)
        # Remove matching memories
        for r in results:
            idx = memory.semantic.memories.index(r)
            memory.semantic.memories.pop(idx)
            memory.semantic.vectors.pop(idx)
        memory.semantic._save()

    @staticmethod
    def export(memory: AgentMemory) -> dict:
        """GDPR: export all stored data."""
        return {
            "conversation": memory.conversation.get_context(),
            "facts": memory.facts.facts,
            "episodes": [vars(e) for e in memory.episodic.episodes]
        }

7 Common Memory Mistakes

❌ 1. Storing Everything

Problem: Every message goes into memory. Context window fills with noise.

Fix: Use extraction — only store facts and insights, not raw conversations. Less is more.

❌ 2. No Relevance Filtering

Problem: Search returns 20 memories and you inject all of them into the prompt.

Fix: Set a minimum relevance threshold (0.7+) and limit to top 3-5 results.

❌ 3. Blocking on Memory Operations

Problem: User waits 2 seconds while the agent embeds and stores memory.

Fix: All memory writes should be fire-and-forget. Use asyncio.create_task().

❌ 4. Never Forgetting

Problem: Memory grows to 10,000 entries. Search gets slow and noisy.

Fix: Implement consolidation. Run monthly. Drop low-access old memories.

❌ 5. No User Controls

Problem: User says "forget that" and the agent says "sure!" but doesn't actually delete anything.

Fix: Implement real forget/export commands. GDPR requires this in the EU.

❌ 6. Trusting Extracted Facts Blindly

Problem: LLM extracts "user hates Python" from a sarcastic comment. Agent avoids Python forever.

Fix: Add confidence scores. For important facts, confirm with the user before storing.

❌ 7. One Memory System for Everything

Problem: Cramming conversations, facts, and experiences into one flat store.

Fix: Use layered architecture. Each layer has different access patterns and lifecycle.

60-Minute Quickstart

Build a memory-powered agent in one hour:

Minutes 0-15: Setup

pip install anthropic chromadb

mkdir -p agent/{memory,data}
touch agent/{main.py,memory/__init__.py}

Minutes 15-30: Core Memory

# agent/memory/__init__.py
import chromadb
from datetime import datetime

client = chromadb.PersistentClient(path="data/chroma")
collection = client.get_or_create_collection("agent_memory")

def remember(content: str, metadata: dict = None):
    """Store a memory."""
    collection.add(
        documents=[content],
        ids=[f"mem_{datetime.now().timestamp()}"],
        metadatas=[metadata or {"type": "general"}]
    )

def recall(query: str, n_results: int = 3) -> list[str]:
    """Find relevant memories."""
    results = collection.query(
        query_texts=[query],
        n_results=n_results
    )
    return results["documents"][0] if results["documents"] else []

def forget(content: str):
    """Remove memories matching content."""
    results = collection.query(query_texts=[content], n_results=5)
    if results["ids"][0]:
        collection.delete(ids=results["ids"][0])

Minutes 30-50: Agent with Memory

# agent/main.py
import anthropic
from memory import remember, recall

client = anthropic.Anthropic()

def chat(user_message: str, history: list) -> str:
    # Recall relevant memories
    memories = recall(user_message)
    memory_context = ""
    if memories:
        memory_context = "Relevant context from past conversations:\n"
        memory_context += "\n".join(f"- {m}" for m in memories)

    messages = history + [{"role": "user", "content": user_message}]

    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        system=f"""You are a helpful assistant with long-term memory.

{memory_context}

After each interaction, naturally reference what you know about the user.
If the user shares a preference or important fact, acknowledge it.""",
        messages=messages
    )

    reply = response.content[0].text

    # Store the exchange as memory
    remember(
        f"User said: {user_message[:200]}. "
        f"I responded about: {reply[:200]}",
        {"type": "conversation"}
    )

    return reply

# Run it
if __name__ == "__main__":
    history = []
    print("Chat with a memory-powered agent (type 'quit' to exit)")
    while True:
        msg = input("\nYou: ")
        if msg.lower() == "quit":
            break
        reply = chat(msg, history)
        history.append({"role": "user", "content": msg})
        history.append({"role": "assistant", "content": reply})
        print(f"\nAgent: {reply}")

Minutes 50-60: Test

# Session 1
You: I prefer Python over TypeScript
Agent: Got it! I'll keep that in mind...

You: My company is called Acme Corp
Agent: Nice! What does Acme Corp do?

# Session 2 (new session — memory persists!)
You: What programming language should I use for this project?
Agent: Since you prefer Python, I'd suggest starting there...
✅ That's it! In 60 minutes you have a working agent with persistent memory. Upgrade path: add fact extraction (Layer 2), switch ChromaDB to Pinecone for scale, add episodic memory for learning.

What's Next?

Memory is the foundation that makes every other agent capability better. Once your agent remembers, you can build:

🚀 Build Production AI Agents Faster

The AI Employee Playbook includes memory system templates, production patterns, and copy-paste code for building agents that actually remember.


Get the Playbook — €29
🚀 Get the AI Employee Playbook — €29