A company deploys an AI assistant for customer service. The model answers questions, cites procedures from an internal knowledge base, and sometimes sees a customer’s name and order number. The legal question arises a week after deployment, not before: Did we sign an agreement with the model provider? The answer "We don’t know" is the most common gap we find during audits. This gap doesn’t exist in systems designed from the first line of code with compliance in mind.
What is a data processing agreement and when is it mandatory
#RODO distinguishes between a data controller (determines the purposes and means of processing) and a data processor (processes data solely on the controller’s instructions). Your company is the data controller for your customers' and employees' data. The AI provider receiving this data to perform a service is the data processor.
Article 28 of RODO states explicitly: every such processing delegation must be documented by an agreement or other legal instrument specifying the subject matter, duration, nature, and purpose of processing, types of data, categories of data subjects, and the obligations and rights of the controller.
The obligation arises when an external entity accesses personal data while acting on your behalf. In the context of AI, this specifically includes:
- Language model providers (cloud APIs) that see prompt content containing user data.
- Automation platforms (n8n in the cloud, Make, Zapier) through which data from CRM or email flows.
- Infrastructure providers (AWS, Azure, GCP) where your AI system runs and conversation logs are stored.
- Implementation partners (like us) with access to data during configuration, testing, and system maintenance.
A DPA is not required when an external entity processes data as a separate controller (e.g., a bank processing payments) or when data never leaves your infrastructure (self-hosted architecture, local models).
What a data processing agreement must include for AI deployments
#A standard DPA includes sections required by Article 28 of RODO. For AI deployments, clauses missing from pre-LLM templates are added.
| Agreement Element | RODO Standard (Art. 28) | AI-Specific Clauses |
|---|---|---|
| Subject and purpose of processing | Mandatory | Precise: inference (queries), fine-tuning (training), RAG (indexing), logs |
| Sub-processors | List with notification obligation | Which base models, GPU providers, CDNs, monitoring services |
| Data subject rights | Support obligation | Mechanism for deleting data from model cache, vectors, logs |
| Technical security | Encryption, access control | PII masking before prompts, guardrails, tenant isolation |
| Data transfer | Standard contractual clauses (SCCs) | Processing region, prohibition on training with customer data |
| Retention and deletion | Time and procedure | Zero-retention for models, TTL for conversation logs, vector purge |
| Audit and reporting | Right to inspection | Inference logs, AI security incident reports |
The "prohibition on training with customer data" clause distinguishes serious AI providers from those that absorb your data into a global model. Many cloud API providers offer this clause in enterprise agreements—but by default, the opposite is often true. Always check API terms before the first production request.
Controller, processor, or joint controller: where’s the line with AI
#The AI model provider isn’t always a processor. The boundary depends on whether and how they can use your data for their own purposes.
Pure processor (DPA suffices): The provider runs the model solely on your instructions, doesn’t use data for their own purposes, doesn’t train on it, and doesn’t retain it beyond the agreed TTL.
Joint controller (requires a joint controllership agreement, Art. 26 RODO): The provider has their own interest in the data, e.g., conducts their own analytics on queries, builds products on them, or profiles your users. This is rarer with AI models but possible with SaaS platforms with built-in AI.
Separate controller: The provider processes data for their own independent purposes, not covered by your agreement. In this case, RODO requires you to inform data subjects that their data is transferred to this entity, which sets its own purposes.
In practice, the vast majority of enterprise-mode LLM APIs are processors—which is why DPAs have been a standard IT tender requirement since 2023.
Data transfers outside the EEA: what this means for AI in practice
#Language models often run on servers outside Europe. Transferring personal data outside the European Economic Area is permitted but requires a legal basis: standard contractual clauses (SCCs), an adequacy decision, or binding corporate rules (BCRs).
In practice, when deploying AI, you must:
- Identify where data is physically processed (the API provider’s server region).
- Verify if the provider offers SCCs or operates in a region with an adequacy decision (e.g., EEA, UK, Japan).
- Document the transfer basis in the record of processing activities (RCP).
- Consider whether data residency in the EU (enforced contractually or via self-hosting) eliminates the transfer issue.
A self-hosted architecture with a local model and local vector database (Qdrant, BGE-M3) removes the transfer question entirely, as personal data never leaves your infrastructure. This is particularly critical for companies handling sensitive data: law firms, medical entities, fintech.
What RODO-compliant architecture looks like technically
#Legal compliance and technical compliance are two sides of the same coin. The DPA provides the legal framework, but the system must enforce it technically. Specifically:
PII masking before the model. Before text with personal data reaches the LLM, our router masks sensitive variables (names, numbers, emails). The model sees [NAME] instead of "Jan Kowalski." If the response doesn’t require this data, the model doesn’t receive it.
Zero-retention on the model side. For cloud APIs, we select endpoints with a confirmed zero-retention policy for prompts. In local architecture, data doesn’t leave the server. Guardrails block responses containing data the model shouldn’t be queried about.
TTL on logs. Conversation logs (needed for debugging and quality audits) have a defined lifespan and purge procedure. Data subjects can request deletion of their data—the system handles this end-to-end, including vectors in the database and logs.
Human gate for irreversible actions. AI agents that can send emails, create documents, or modify data require human confirmation. No action affecting personal data is fully autonomous without an explicitly defined scope. This is a technical requirement reinforcing human oversight from the AI Act.
More on the security architecture of AI agents in the article AI agent security.
AI Act and DPA: how they overlap
#The AI Act and RODO are two distinct regimes, but they overlap in the data domain. The AI Act adds additional elements to agreements with AI providers:
- Technical documentation of the system—high-risk system providers must make it available. For low-risk systems, transparency suffices.
- Registry—high-risk systems require registration in the EU database and risk management documentation.
- DPIA—when an AI system profiles, evaluates, or makes automated decisions about people, RODO requires an impact assessment, and the AI Act requires a risk analysis. In practice: one document covering both requirements.
In your agreement with the AI provider, it’s worth explicitly stating which AI Act risk level the system represents, who is responsible for technical documentation, and how AI security incidents are handled. Without this, the controller (you) will struggle to demonstrate compliance during a potential audit.
A detailed discussion of the AI Act for companies deploying AI in Poland can be found in the article AI Act and RODO 2026.
Practical checklist before signing an agreement with an AI provider
#Before signing an agreement with a model, platform, or infrastructure provider:
- Does the provider offer a DPA compliant with Article 28 RODO? (If not—walk away or process data anonymously.)
- Does the agreement include a clause prohibiting training on your data?
- Are processing regions and the basis for transfers outside the EEA specified?
- Is there a list of sub-processors with an obligation to notify of changes?
- Are TTL for logs and purge procedures precisely defined?
- Is it specified how the provider handles data subject requests (access, deletion)?
- In case of an incident, do you have the right to a report within 72 hours (RODO requirement)?
If your agreement includes all these points, you have a solid foundation. If not—address this before production launch, not after the first inquiry from the Polish Data Protection Authority (UODO).
You can preliminarily assess your organization’s data and compliance readiness with the AI readiness assessment or calculate the cost of proper architecture with the ROI calculator. We’ll discuss architecture details in a free pilot.
Try it live
#Describe your AI deployment scenario, and the model will identify which DPA elements are critical and what to watch for when choosing a provider. This is a starting point for legal review, not legal advice. PII is masked before the model, zero retention.
FAQ
#When exactly is a DPA mandatory for AI?
#The obligation arises when an external entity comes into contact with personal data and processes it solely on your instructions. In AI practice: when a language model API, automation platform, or hosting provider sees content containing your customers' or employees' data. The absence of a DPA in this situation is a violation of Article 28 RODO, regardless of whether an incident occurred.
Do popular AI model APIs have ready-made DPAs?
#Most major API providers offer DPAs in their enterprise programs. The issue is that default consumer or free-tier terms often don’t include DPAs or contain clauses allowing training on your data. Before using any API in production with personal data, download and sign the DPA from the provider—don’t assume standard ToS are sufficient.
Does a self-hosted architecture eliminate the need for a DPA?
#Yes, in terms of the model and infrastructure entirely under your control. If the language model, vector database, and conversation logs run on your servers and no personal data leaves your network, there’s no data processor to whom you must delegate data. A DPA is still needed with the implementation partner that accesses the system during installation and maintenance. More on self-hosting architecture.
What happens to data in the model after a conversation ends?
#It depends on the architecture. Generative models don’t "remember" conversations between sessions, but API providers may retain prompt logs for up to 30 days (or longer) for debugging and security. The DPA should specify the TTL for logs and their deletion mechanism. With a local LLM and zero-retention on the model side, conversation data doesn’t leave your infrastructure. Vectors in the embedding database are a separate resource requiring their own retention and purge policy.
Do I need to conduct a DPIA when deploying an AI assistant?
#Not always. A DPIA is required when processing may pose a high risk to individuals' rights: large-scale profiling, processing sensitive data, or automated decisions about people. An assistant that only answers questions from your knowledge base and doesn’t profile users typically doesn’t require a DPIA. An assistant that evaluates customer sentiment, categorizes them, or directs them to different paths based on their profile likely does. The boundary is determined by a lawyer, not the AI vendor.