Self-hosted LLM and GDPR: How to use AI without sending dat…

The biggest GDPR challenge with AI isn’t the model itself—it’s the data flow. When a query containing personal data hits a cloud API, it leaves your control: a data processing agreement becomes necessary, questions arise about server locations, and what the provider does with the content. Self-hosting eliminates this step.

What a self-hosted model actually changes#

No third-party transfer — data stays on your servers or in your private cloud.
Fewer data processing agreements — you don’t outsource processing to an external LLM provider.
Full retention control — you decide what’s stored and for how long, and can realistically enforce the right to erasure.
Processing location transparency — you know exactly where data resides, without guessing API regions.

The foundation isn’t just the LLM itself, but also the BGE-M3 embedding server, which enables RAG on company knowledge to run locally—semantic search across your documents without sending them outside.

The table below shows what self-hosting changes for each of the main GDPR obligations compared with a cloud API. Self-hosting removes none of the obligations—it only shifts the point of control from the provider to you.

GDPR obligation	Cloud API	Self-hosted LLM
Legal basis for processing	Your obligation + a basis for the processor agreement	Your obligation; no separate basis for transfer to the LLM
Data minimization	Requires filtering the prompt before sending it outside	Data never leaves the organization; minimization still recommended
Retention and deletion	Depends on the provider’s policy and logs	Fully on your side—your own TTLs and deletion procedures
Right to erasure (Art. 17)	Must cover logs and the index on the provider’s side	You delete in-house, including the RAG index
Transfer outside the EEA	Often requires SCCs / a server-location assessment	No transfer if the infrastructure sits in the EEA
Processor agreement (Art. 28)	Required with every provider processing the data	Usually unnecessary toward the model provider—see data processing agreements and AI

Compliance-by-design, not after the fact#

We design compliance from the start, not as an afterthought. In practice, this means: data minimization (the model only gets what it needs), masking PII before anything reaches the model, access logging, and clear boundaries on what the system can do with the data.

It’s also worth planning a data protection impact assessment (DPIA) early on—when processing sensitive data or at large scale it may be required regardless of where the model runs. We cover the AI Act and GDPR deadlines and obligations in more detail in our guide to company obligations in 2026 under the AI Act and GDPR.

Hybrid approach: cloud where permitted#

Not every workflow requires on-premises hosting. Non-personal or anonymized data can be handled by a more powerful cloud model. A router directs sensitive queries to the local model and the rest to the cloud—while masking PII before any external transfer. In practice, such an LLM router follows a simple rule: if the query contains personal data, it goes entirely to the local model; if not, the prompt is masked and only the anonymized version reaches the cloud. Security and GDPR take priority over any single feature.

A sample path for an input containing PII looks like this: detect entities (name, national ID, address) → mask or route to the local model → process → restore context locally in the response. Only queries without personal data go to the more powerful cloud model. Below you can check how the model would draft such a routing policy for your own set of tasks.

▶Design a local-vs-cloud routing policy for GDPRsandbox · reasoning

FAQ#

No—it removes the hardest part: data leaving your organization. You’re still responsible for legal basis, minimization, retention, and data subject rights. Self-hosting gives you full control over these.

Do I need an expensive GPU cluster to host a model locally?#

Not necessarily. For many use cases—classification, extraction, RAG over company documents—a smaller model on a single GPU is enough. Only complex reasoning, long context, or a high number of concurrent queries justify a more powerful machine or a cluster. We treat hardware as a fixed, amortized cost spread over time rather than a per-call fee—so we match the setup to your real workload and budget. Predictable cost matters more than maxed-out hardware.

What about data that still goes to the cloud?#

We mask PII before sending, limit data to the bare minimum, and route sensitive workflows to the local model. This hybrid approach means: local where necessary, cloud where permitted.

Does self-hosting eliminate the processor agreement entirely?#

Not always. The processor agreement with the model provider goes away, because that provider stops processing the data. But if the model runs in someone else’s private cloud and a third party maintains the infrastructure, a processor agreement (Art. 28 GDPR) may still be needed toward that hosting provider. We cover the details in our article on data processing agreements and AI.

What about the right to erasure in the RAG index?#

The vector index also contains personal data, so it falls under the right to erasure. With self-hosting you have full control over this: you delete the source document and then remove the corresponding vectors and chunks from the index (and from the response cache, if one exists). It’s worth linking the document ID to its embeddings up front, so deletion is a single, predictable step rather than a manual search through the database.

Self-hosted LLM and GDPR: How to use AI without sending data outside

What a self-hosted model actually changes#

Compliance-by-design, not after the fact#

Hybrid approach: cloud where permitted#

FAQ#

Do I need an expensive GPU cluster to host a model locally?#

What about data that still goes to the cloud?#

Does self-hosting eliminate the processor agreement entirely?#

What about the right to erasure in the RAG index?#

Self-hosted LLM and GDPR: How to use AI without sending data outside

What a self-hosted model actually changes#

Compliance-by-design, not after the fact#

Hybrid approach: cloud where permitted#

FAQ#

Do I need an expensive GPU cluster to host a model locally?#

What about data that still goes to the cloud?#

Does self-hosting eliminate the processor agreement entirely?#

What about the right to erasure in the RAG index?#

Self-hosted LLM and GDPR: How to use AI without sending data outside

What a self-hosted model actually changes#

Compliance-by-design, not after the fact#

Hybrid approach: cloud where permitted#

FAQ#

Does a self-hosted LLM guarantee full GDPR compliance?#

Do I need an expensive GPU cluster to host a model locally?#

What about data that still goes to the cloud?#

Does self-hosting eliminate the processor agreement entirely?#

What about the right to erasure in the RAG index?#

Self-hosted LLM and GDPR: How to use AI without sending data outside

What a self-hosted model actually changes#

Compliance-by-design, not after the fact#

Hybrid approach: cloud where permitted#

FAQ#

Does a self-hosted LLM guarantee full GDPR compliance?#

Do I need an expensive GPU cluster to host a model locally?#

What about data that still goes to the cloud?#

Does self-hosting eliminate the processor agreement entirely?#

What about the right to erasure in the RAG index?#