Most discussions about AI security stop at “don’t paste confidential data into a public chat.” That’s true, but it’s only the first, simplest vector. In production systems — assistants with knowledge bases, agents with CRM access, automation workflows — data leaks in less obvious ways: through context the model retains, through logs enabled “temporarily for debugging,” through the model provider’s licensing terms. This article breaks down four real leak vectors and describes concrete defenses for each. We don’t promise “full security” — in security, such a promise is a red flag. We describe how to reduce risk to a level that can be documented and justified.
Vector 1: Prompt injection extracting context
#Prompt injection is the injection of instructions into content that the model processes as input. In a system with a knowledge base, the model sees its system prompt, retrieved document fragments, and — sometimes — other users’ data in its context. An attacker who tricks the model into “reading aloud” this context extracts things it should never see: system instruction content, fragments of others’ data, internal source structures.
The most dangerous variant is indirect injection: the malicious instruction isn’t typed into the chat but hidden in a document the system automatically pulls into context (email, PDF, webpage). The model treats it as part of the task. That’s why validating only what the user inputs isn’t enough — you also need to control what the system ingests from external sources.
Defense has three layers. First: the system prompt never contains secrets or data whose disclosure would be harmful — keys and sensitive data are kept outside the model’s context. Second: input guardrails detect attempts to manipulate instructions (patterns like “ignore previous commands,” language-switching as camouflage). Third: per-role and per-tenant isolation, so even a successful injection can’t access data the user isn’t authorized to see. A detailed test checklist is in our article on AI assistant security audits, and permission architecture is covered in AI agent security.
Vector 2: PII in prompts and logs
#This is the most common leak we see in practice — and the least spectacular. Personal data enters the system entirely legally: a customer provides their name and case number, an employee pastes a contract excerpt. The problem starts later, when this data travels where no one intended: to an external model API in plaintext, to application logs, to a “conversation history” table without a retention policy.
Logs are particularly insidious. Observability is needed for diagnosing issues, so someone enables full query logging “temporarily” — and it stays that way for months. After six months, you have a PII repository without a legal basis, without retention, and without a deletion mechanism required by RODO.
Defense means masking PII before the model and before writing to logs. In our approach, the query passes through a pseudonymization layer before reaching the LLM: names, PESEL numbers, emails, and account numbers are replaced with tokens, and originals — if needed at all — stay on your controlled side. Logs record operational metadata (time, status, token cost), not raw content. How to prepare company data so this layer works from the start is covered in preparing data for AI.
Vector 3: Data in provider training and retention
#The third vector is contractual, not technical — and that’s why it’s easy to overlook. When you send a query to an external model API, the key question is: what does the provider do with that data after processing? Is it stored? Can it end up in the training set for the next model version? Is it processed outside the European Economic Area?
These are questions about data-residency and retention, and the answers are in the licensing terms, not the technical documentation. The difference between “a business API with zero-retention guarantees” and “a free consumer chat that learns from conversations” is fundamental for RODO compliance and often determines whether you can use that provider for customer data at all.
Here, architecture choice makes the biggest difference. Self-hosting the model — running it on your own or controlled infrastructure — eliminates this vector at the source: data never leaves your environment, so the question “what does the provider do with it” disappears. The trade-off is maintenance and usually lower raw performance of the best models. The compromise and decision criteria are broken down in self-hosted LLM and RODO.
Vector 4: Sensitive data disclosed in responses
#The fourth vector works in reverse: it’s not about what you feed into the model, but what the model returns to the user. A knowledge-base system might retrieve a document the user isn’t authorized to see — and summarize its content, bypassing all access controls built at the application level. Or the model might “complete” the response with data it remembered from another user’s conversation in a poorly isolated session.
Defense combines access control with output guardrails. Permission filtering must work at the context-retrieval level (the model only sees fragments the user is authorized for), not just at the response level. On output, a second guardrail layer scans the response for sensitive data patterns before it reaches the user. This same layer ensures the model doesn’t quote secrets or PII even if they somehow ended up in the context.
Table: leak vector, mechanism, defense layer
#Four vectors require four different defense layers. The table below matches what protects against what — and shows why a single measure isn’t enough.
| Leak Vector | How Data Escapes | Defense Layer | What’s NOT Enough |
|---|---|---|---|
| Prompt injection (context) | model “reads aloud” system prompt or others’ fragments | input guardrails + per-role isolation + no secrets in prompt | input validation alone |
| PII in prompts and logs | personal data in API and logs in plaintext | masking/pseudonymization before model and logs | “don’t log PII” without technical enforcement |
| Data in provider training/retention | external model stores or learns from data | zero-retention in contract or self-hosting | trusting provider defaults |
| Sensitive data in responses | model returns content outside user permissions | access control at context retrieval + output guardrails | filtering permissions only in UI layer |
No row covers the others. Self-hosting (row 3) doesn’t protect against prompt injection (row 1). PII masking (row 2) won’t stop a response quoting someone else’s document (row 4). Security is the sum of layers — and that’s why audits check each one separately.
How to tie this into a coherent policy
#Four technical layers only work if backed by an organizational decision: which data can be sent to the model, in what form, and through which provider. This is the domain of AI data governance — data classification, flow registry, and responsibility assignment. Without it, every technical layer is configured ad hoc and drifts over time.
For customer data, check whether processing requires a DPIA — a data protection impact assessment. The leak vector audit results (from the table above) naturally feed into such an assessment: they show what risks exist and how they’ve been mitigated. Documentation obligations for 2026, combining RODO with the AI Act, are covered in company obligations under AI Act and RODO.
The practical implementation order is: first, data classification and provider decision (governance), then PII masking and access control (technical layers to build once and well), finally guardrails and audit as the verification layer. Starting with guardrails on a system already sending raw PII to an external API is treating symptoms, not causes.
FAQ
#Is it enough to avoid pasting confidential data into an AI chat?
#Necessary, but far from sufficient for a production system. When building an assistant with a knowledge base or an agent with CRM access, data reaches the model by design, not accidentally — and that’s where you need PII masking, access control, and retention policies. The “don’t paste confidential data” rule protects against just one of four vectors.
Does self-hosting LLM solve the data leak problem?
#It closes one specific vector: data doesn’t reach an external provider, so the risk of retention and training on your data disappears. However, it doesn’t eliminate prompt injection, PII leaks to logs, or unauthorized data disclosure in responses — those depend on architecture, not hosting location. Self-hosting simplifies RODO compliance, but auditing the remaining layers is necessary regardless of infrastructure choice.
What does PII masking before the model involve?
#It’s a layer that intercepts the query before it reaches the LLM and replaces personal data (names, PESEL, emails, account numbers) with placeholder tokens. The model works with anonymized content, and original data — if needed at all for the response — stays on your controlled side. This way, even a context leak or full-content log doesn’t reveal real personal data.
How can I check if the model provider uses my data for training?
#The answer is in the service’s licensing terms, not the technical documentation — look for clauses on data retention, training, and processing location (data-residency). Paid business APIs usually offer zero-retention and opt-out from training; free consumer services often don’t. If you process customer data, treat the absence of an explicit zero-retention guarantee as a lack of permission to use that provider.
How many defense layers do I really need?
#As many as you have active vectors. If the system uses an external API and processes customer PII, all four are in play — and then you need masking, guardrails, access control, and a decision on provider retention or self-hosting. Cost and time estimates depend on scale, so we provide ranges only after data inventory; the starting point is classification, which shows which vectors apply to you at all.