A company deploying an AI assistant for B2B customer service may have 300-800 active contractors in a single system instance. If long-term memory isn’t isolated per-tenant, an agent serving a healthcare client might pull context from conversations with a financial sector client. This isn’t a theoretical scenario: misconfigured filtering in a vector database causes such leaks in practice before anyone notices.
Three Layers of Agent Memory
#Every production-grade agent operates on at least three memory layers, each with different lifecycles and requirements.
Session memory is the context of the current conversation window (context window). It lives from the first message until the session ends. It isn’t persisted unless explicitly saved. It contains the entire exchange history in the current session, the system instruction, and retrieved RAG snippets. Its size is limited by the number of tokens the model can handle in one window, typically 8,000-128,000 tokens depending on the model.
Vector memory (long-term) consists of persisted embeddings of conversation history snippets, client documents, and facts the agent should remember between sessions. You store it in a vector database (Qdrant, Weaviate, pgvector). Semantic search on every new query retrieves relevant snippets from this layer as context for the response, following the RAG pattern.
User/tenant profile contains structured data: preferences, roles, case history, granted consents, retention dates. Store it in a relational database (not vector-based) because you need precise queries, updates, and cascading deletes. This record serves as proof of consent and the source of truth for the right to be forgotten.
Context Isolation: How to Avoid Mixing Client Data
#The most common architectural mistake: a vector database without tenant ID filtering. The secure pattern looks different.
Every snippet saved to the vector database gets tenant_id metadata (and optionally user_id). Every query includes a filter where: { tenant_id: { eq: current_tenant } } as a mandatory condition before cosine similarity is calculated. Without this filter, the embedding model might return a snippet from a completely different context if it’s semantically similar.
Additional isolation layers worth implementing:
Namespace per-tenant in the vector database: Qdrant supports collections or points with per-tenant filters. Storing different clients’ data in separate collections provides stronger isolation than metadata filtering alone but costs more resources with hundreds of tenants.
Session token as context boundary: Each session gets a unique identifier. History snippets saved to vector memory include session_id as metadata. This lets you delete an entire session’s history without touching the rest of the tenant’s data.
Guardrails on output: Before the agent returns a response, the guardrails layer checks if the response contains another client’s PII. This is a safety net, not the primary defense line. More on multi-step agent architecture in the article about multi-step agents.
Table: Memory Type, Use Case, and GDPR Requirement
#| Memory Type | Use Case | Where to Store | GDPR Requirement |
|---|---|---|---|
| Session (in-memory) | Current conversation history, query context | RAM / temporary buffer | Do not persist without consent; deleted after session ends |
| Vector (history embeddings) | Retrieving context between sessions | Vector database with tenant_id filter | Retention limited to purpose, deletion on request (cascade by embedding ID) |
| User profile | Preferences, consents, roles, case history | Relational database (SQL) | Legal basis + TTL + right to be forgotten within 30 days |
| Conversation logs (audit) | Debugging, GDPR Art. 5 accountability | Secure logs, separate from vectors | Retention max 12-24 months, PII pseudonymization |
| Semantic cache | Answers to repetitive questions | Vector database with TTL | Key per-tenant, do not cache responses with PII |
Retention and the Right to Be Forgotten
#GDPR Art. 17 grants data subjects the right to request deletion of their data. For agent memory, this means a cascading operation across at least four resources.
Step 1: Delete the profile from the relational database. This is usually a simple DELETE WHERE user_id = X operation but requires collecting all identifiers linked to that user beforehand.
Step 2: Delete embeddings from the vector database. Qdrant allows deleting points by metadata filter: DELETE WHERE user_id = X AND tenant_id = Y. Before doing this, you need a list of point_id linked to the user, so the relational profile database should store a mapping user_id → [point_id_1, point_id_2, ...].
Step 3: Delete or anonymize logs. Audit logs may contain conversation content. Options include pseudonymization (replace user_id with an irreversible hash) or full deletion if the processing purpose doesn’t justify longer retention.
Step 4: Invalidate cache. If the semantic cache stores responses containing this user’s data (rare but possible with personalized answers), delete the associated cache entries.
Deadline: 30 calendar days from the request. Automate the entire workflow and log request fulfillment as proof of accountability (GDPR Art. 5(2)). PII anonymization patterns before model processing are covered in the article on PII anonymization.
Retention as Policy, Not Just Technicalities
#Data retention in agent memory isn’t just about storage duration. It’s a decision on how long the agent “remembers” context and how that affects response quality versus the risk of storing excessive data.
Practical approach: Divide vector memory into segments with different TTLs. Current client project history: TTL 6-12 months or until project closure. General communication preferences: TTL 24 months. Sensitive data (e.g., financial details): TTL 3-6 months or after first use.
Document your retention policy in the DPIA. If your agent processes data systematically on a large scale (e.g., serves thousands of clients), a DPIA is mandatory before launch. It includes the processing purpose, data categories, retention period, and applied safeguards.
A retention pattern that works well in enterprise deployments: Every snippet in the vector database has an expires_at field (timestamp). A daily database job deletes expired snippets and updates the point_id mapping in the user profile. For non-personalized corporate knowledge bases, retention is indefinite, but every document update triggers reindexing. RAG knowledge management patterns are covered in the article on corporate GPT based on knowledge.
Self-Hosted Architecture and GDPR Compliance
#If client data is particularly sensitive (health data, financial data, trade secrets), self-hosting the entire agent memory stack eliminates the risk of data transfer to a third country. The embedding model (e.g., BGE-M3), vector database (Qdrant), and LLM (Ollama) can run on your own infrastructure. Everything remains under data residency requirements mandated by GDPR. Detailed self-hosting patterns for LLMs under GDPR are covered in the article on multi-agent systems in enterprises.
Design Agent Memory Live
#FAQ
#How to technically implement the right to be forgotten in a vector database?
#Vector databases like Qdrant allow deleting points by metadata filter. Store the mapping user_id → [point_id] in a relational database. Upon deletion request: fetch the point_id list from the profile, call DELETE points WHERE id IN (...) in the vector database, delete the profile from SQL, and anonymize logs. The entire workflow should be automated and logged as proof of accountability.
How much time and data are needed for vector memory to provide real value?
#First results appear after 20-50 saved snippets per user, typically after 3-5 working sessions. With 100+ snippets, the agent starts accurately retrieving context from previous projects. Quality depends more on chunking accuracy and write policy (what you save) than on data volume.
Can I store session summaries instead of full transcripts?
#Yes, this is a good practice for two reasons. First, summaries take up fewer tokens in the context window, so the agent works faster. Second, you can remove PII from summaries before saving, simplifying retention and reducing DPIA scope. Downside: loss of precision for specific facts the agent should remember exactly.
How to avoid situations where the agent confuses data from two clients with similar profiles?
#The tenant_id filter in the vector database is mandatory and must be the first query condition before semantic similarity is calculated. Similarity of embeddings alone isn’t sufficient for isolation. Additionally, the guardrails layer on output can verify if the response contains identifiers from another tenant (company names, contract numbers).
Does vector memory require a separate database, or can I use PostgreSQL with pgvector?
#Both approaches work in production. pgvector in PostgreSQL (extension) is a good solution for fewer than 1-2 million vectors and traffic below a few hundred queries per minute. For larger volumes, dedicated databases (Qdrant, Weaviate) offer better metadata management, multi-attribute filtering, and ANN (approximate nearest neighbor) optimizations. The advantage of pgvector is a single database system for both user profiles and embeddings, simplifying cascading deletes when fulfilling the right to be forgotten.