The product team updates the pricing on Friday afternoon. The RAG assistant handles customer queries over the weekend, responding based on vectors generated on Tuesday. The customer receives an incorrect price, files a complaint, and the support team spends an hour explaining. This isn’t a hypothetical scenario. It’s a pattern that appears in every RAG deployment lacking a well-thought-out knowledge update layer.
The problem doesn’t lie in the language model itself or the vector search engine. It lies in the assumption that the knowledge base is a state, not a stream. Companies that understand this build the update pipeline alongside the indexing pipeline — not as a separate project for "later."
Why Static Indexing Isn’t Enough
#Early RAG deployments often look the same: a one-time indexing of a few hundred documents, impressive results, and the project goes into production. After a month, the first complaints about outdated answers appear. After a quarter, maintaining the knowledge base becomes a full-time job for someone on the team.
The reason is simple. Corporate documents aren’t static. Procedures change, pricing updates, products are discontinued. Every change that doesn’t make it into the index creates a discrepancy between what the organization knows and what the assistant responds with. This discrepancy is a risk: incorrect information to a customer, a wrong decision by an employee, an escalation to a consultant that shouldn’t have happened.
Three mechanisms exacerbate the problem as the base grows:
Old vectors don’t disappear automatically. When a document is updated, its previous version still sits in the vector database as a separate record. During semantic search, both vectors may be relevant — the model receives conflicting fragments and either chooses randomly or combines them into an inconsistent response.
Full reindexing is costly. With 10,000 documents and a locally running embedding model, a full reindex takes several minutes. With a cloud-based model, it generates costs proportional to the number of tokens. Doing this every night means seven full reindexes in a week, six and a half of which are vectors that haven’t changed.
Knowledge drift is invisible without monitoring. The system works, responses come in, operational metrics are green. But the quality of responses gradually declines because more and more queries hit outdated fragments. Without active accuracy measurement, this drift remains invisible until a user reports it.
Incremental Reindexing Architecture
#The proper solution is event-driven reindexing: every document change triggers reindexing of only that document, not the entire corpus.
Four elements are needed:
Change detector. Every document has a content hash (SHA-256 is sufficient) stored with vector metadata. Before indexing, the pipeline compares the current version’s hash with the stored one. Reindexing occurs only if they differ. When integrating with a CMS, SharePoint, or Git repository, you can instead listen for webhooks or API events — a document change directly triggers an indexing task.
Task queue. Indexing events go into a queue (Redis, Celery, or a dedicated message broker). The queue ensures that a sudden batch of updates (e.g., pricing updates before a season) doesn’t block the system — tasks are processed gradually, with prioritization for documents with a high number of queries.
Atomic vector replacement. When a document is updated, old vectors are marked as deprecated (soft delete with a timestamp), new ones are indexed, and then the old ones are removed. Never the other way around. The window during which both versions exist in the database lasts a few seconds and is managed by versioning metadata.
Indexing logs. Every operation (new, updated, deleted) is logged with a timestamp, document identifier, and version. The log is independent of the vector database itself and allows for auditing: when a given document was last reindexed, how many times it was changed, and whether the indexing succeeded.
Document and Vector Versioning
#Versioning in a RAG system operates on two levels, which are worth separating conceptually.
Document versioning answers the question: which version of the document is currently active? The minimal implementation is a version or valid_from / valid_to field in the metadata of each fragment. During a query, a metadata filter excludes fragments with valid_to in the past. This makes it possible to roll back updates: reverting valid_to restores the previous version without reindexing.
Vector versioning answers the question: does this vector come from the embedding model we’re currently using? Embedding models change. Migrating from one model to another (e.g., upgrading to a newer version of BGE-M3 or changing the vector dimension) means old and new vectors aren’t comparable. Each vector’s metadata should include the model identifier and dimension. During a model migration, old vectors are gradually replaced with new ones — dual-index during the transition period, then removal of the old ones.
| Operation | Scope | Trigger | Relative Cost |
|---|---|---|---|
| Incremental reindexing | Modified documents | Event (webhook, hash diff) | Low |
| Category reindexing | Domain or folder | Schedule or bulk update | Medium |
| Full reindexing | Entire corpus | Embedding model migration | High |
| Soft delete | Retired document | Status change in CMS | Zero (metadata) |
Full reindexing should be an exceptional operation, not a routine one. If you need to do it regularly, it’s a sign that the change detection architecture isn’t working.
Knowledge Drift Detection
#Knowledge drift is the growing discrepancy between what users are asking and what’s in the database. Two types are most common.
Content drift: Documents in the database are up-to-date, but user questions concern topics that aren’t yet indexed. Visible in metrics as an increasing percentage of "I don’t have that information" responses or a rising number of escalations to consultants.
Quality drift: Documents in the database contain outdated information, but the vectors are still semantically relevant, so the system confidently responds based on incorrect data. This is the harder case because operational metrics don’t signal the problem. It’s only visible through active accuracy measurement or user feedback.
Practical drift detection requires three elements:
Regular accuracy tests on a set of golden questions and expected answers. The set should cover questions from different knowledge domains and be updated along with the database. Once a week is sufficient for most systems.
Alerts for an increasing percentage of "I don’t know" responses. A sudden spike signals that new topics have emerged without coverage in the database — or that documents in that domain were changed and not reindexed.
Version trace in every response. Logging which version of which document a fragment used to generate the response allows post-hoc verification of whether the response was based on current or retired content. Without this, auditing is impossible. This pattern is also a requirement of the AI Act in systems classified as high-risk — the decision trail must be reproducible.
Update Priorities: Not All Documents Are Equal
#With limited indexing resources, it’s worth prioritizing. Not every document has the same impact on response quality.
High priority goes to documents that the system most frequently uses to generate responses. Logging the document identifier with each use (if RODO and PII anonymization are respected) allows building a popularity ranking. Documents at the top of the ranking should be checked for currency with every domain change.
High priority also goes to documents with a short lifecycle: pricing, SLAs, regulations, schedules. Their metadata should include a valid_to field with a defined expiration date. The system should automatically mark them as outdated after that date and require confirmation from the document owner before further use.
Low priority goes to conceptual and historical documents that don’t change often. They can be reindexed once a quarter or only upon explicit updates.
RODO, Retention, and Knowledge Deletion
#A RAG knowledge base may contain personal data: call transcripts, emails, project notes. The right to be forgotten (Article 17 of RODO) applies not only to the document database but also to vectors.
Deleting a document from the repository doesn’t automatically remove the vectors generated from it. Explicit deletion from the vector database with confirmation in the log is required. The procedure should be documented and testable: after deletion, no query should return fragments from the deleted document.
For sensitive data (e.g., transcripts containing customer data), the standard approach is PII masking before indexing. The indexed fragment then doesn’t contain personal data, and deletion requests only apply to the original documents, not vectors. We describe this pattern in detail in the article on PII anonymization before AI.
For sensitive data and regulatory requirements, the entire stack (embeddings + vector database + model) should run locally. Self-hosting eliminates questions about data transfer to the cloud and simplifies DPIA.
Try It Live
#Describe the structure of your knowledge base and the frequency of document changes, and the model will propose an update pipeline architecture tailored to your scale — as a starting point for discussion with your technical team (playground: PII masked, zero retention):
FAQ
#How often should I reindex the RAG knowledge base?
#The answer depends on the pace of document changes, not an arbitrary schedule. For documents that change weekly or more often, the proper pattern is event-driven reindexing: a webhook or hash diff triggers reindexing immediately after a change. For static or rarely changed documents, a weekly or monthly schedule is sufficient. Full reindexing of the entire corpus should be an exception, not a routine — you do it during an embedding model migration or a major database reorganization.
What happens if old and new vectors of the same document are in the database simultaneously?
#The system will return both as candidates for the response. The language model will receive conflicting fragments and behave unpredictably: it may choose randomly, combine conflicting information, or admit to the discrepancy. The proper pattern is atomic replacement: new vectors are indexed, old ones are marked as deprecated and removed in a single transaction. The inconsistency window should last seconds, not minutes.
How can I detect that the knowledge base is outdated before a user reports it?
#Three signals are worth monitoring. First: an increasing percentage of "I don’t have that information" responses compared to the baseline. Second: a rise in escalations to consultants for questions that were previously handled automatically. Third: regular accuracy tests on a set of golden questions with expected answers. Combining these three provides early warning before the problem becomes visible to customers. We describe the measurement pattern in detail in the article on AI agent quality monitoring.
Does the right to be forgotten (RODO) apply to vectors as well?
#Yes. A vector generated from a document containing personal data is a derivative of that data and is subject to the same obligations as the original. Deleting the document from the source repository isn’t enough — explicit deletion of vectors from the vector database, confirmed in a log with a timestamp, is required. The recommended alternative is PII masking before indexing: the indexed fragment then doesn’t contain personal data, and the right to deletion applies only to the original documents.
How do I manage migration to a new embedding model?
#Old and new vectors aren’t geometrically comparable — you can’t mix vectors from different models in one collection. The safe migration procedure is dual-index: parallel maintenance of the old collection (which production responds from) and the new one (where incremental reindexing goes). After reindexing the entire corpus, traffic switches to the new collection, and the old one is deleted. Migration time depends on the corpus size and available computational resources — with self-hosting and a local model, it usually takes several to a dozen hours.