A Polish e-commerce company is entering the Czech, Slovak, and Romanian markets. The customer service department receives queries in four languages. Hiring native speakers for each language is costly, and translating queries via Google Translate before responding through a Polish agent introduces delays and stylistic errors that customers immediately notice.
This isn’t a problem exclusive to e-commerce. It affects any company serving customers from different countries, employing a multinational team, or simply offering products in multiple language versions. A multilingual AI assistant solves this problem differently than the classic approach of separate bots per language.
Why one index instead of multiple bots
#The intuitive approach is to build a separate assistant in Polish, another in English, and translate the entire knowledge base into each language. Such a system quickly becomes an operational nightmare. Every content update requires updates in N languages, consistency between versions is hard to maintain, and scaling costs grow linearly with the number of languages.
Architectures based on multilingual embedding models solve this problem from the vector representation side. Instead of storing content in each language, the system indexes documents once (usually in the base language, often English or Polish), and multilingual models ensure that a query in Romanian and a semantically similar query in Polish land in a similar area of the vector space.
BGE-M3 supports over 100 languages with a single model and is available locally via Ollama, meaning customer queries don’t leave the infrastructure before the search stage. This is critical for personal data in query content.
Three conditions that must be met for this architecture to work in practice:
- Base content quality: Documents in the index must be precise. Multilingual embedding transfers the quality—and errors—of the source content to all languages.
- Generative model with multilingual support: The LLM must not only understand the query in a given language but also generate a correct response. Not all models handle Central European languages as well as English.
- Guardrails in every language: Injection filters, confidence thresholds, and topic scope must work regardless of the input language. An assistant that refuses out-of-scope questions in English but accepts them in Czech has a security gap.
Language detection and response routing
#The core mechanism of a multilingual assistant is language detection at the input and maintaining that language throughout the response flow. It’s worth distinguishing several layers:
Query language detection happens deterministically before invoking the LLM. Libraries like langdetect or lingua-py work locally, without network latency, and classify language with over 95% confidence for texts longer than 15 characters. For short queries ("status," "help," "price"), confidence drops, and the system should either ask for clarification or default to the session language.
Maintaining language in conversation context is crucial for assistants where the conversation spans multiple turns. A customer who starts in English and then switches to Polish should receive a consistent response. Simple scheme: the language of the first query in a session becomes the default, later changes are respected, but the system doesn’t switch for a single word in another language (prevents switching due to accidental anglicisms).
Routing to the model: Not every generative model supports all languages equally well. The pattern used in production systems is an LLM router, which selects a model from a matrix based on the detected language. Languages with very good coverage in training data (English, Spanish, French, Chinese, Arabic) can go to general models. Central European languages (Polish, Czech, Slovak, Romanian, Hungarian) may require models with better coverage or explicitly trained on those languages.
Multilingual RAG architecture: production schema
#Customer query (any language)
│
▼
[Language detection — local, deterministic]
│
▼
[PII masking — local, before leaving infrastructure]
│
▼
[BGE-M3 embedding — local, multilingual, 1024-dim]
│
▼
[Vector search — shared index, hybrid search]
│
▼
[Reranking — optional for top-k contexts]
│
▼
[LLM router — model selection by language and task]
│
▼
[Response generation in query language]
│
▼
[Guardrails — scope, confidence, injection check]
│
▼
Response or human-handoff
The key element is the shared index: documents are indexed once, hybrid search combines semantic and lexical (BM25) search, and reranking improves precision for queries with linguistic nuances. More on index architecture in the article semantic search and embeddings in business.
Approach comparison: single model vs. model router
#| Approach | Advantages | Limitations | When to use |
|---|---|---|---|
| Single multilingual model (e.g., Qwen, Mistral) | Simple architecture, single maintenance point | Uneven quality across languages, especially for rare languages | 2-4 languages with good model coverage |
| Per-language model router | Optimal quality per language, specialized models | Higher infrastructure cost, routing latency | 5+ languages or when response quality is critical |
| Base model + per-language fine-tuning | High precision for specialized languages (e.g., legal PL) | Training and maintenance cost, requires per-language data | Industries with very specific vocabulary |
| Translation to one language before LLM | Compatibility with any English model | Translation errors propagate, PII in external API | Rarely justified in 2026 with available multilingual models |
For most companies with 3-6 European languages and standard customer service, one good multilingual model with a router to a specialized model for Central European languages is the right cost-quality choice.
Multilingual guardrails: what breaks without them
#Guardrails that work in only one language create a false sense of security. Several failure patterns emerge in systems where guardrails were designed solely with the dominant language in mind:
Injections via language switching: A user writes normally in English, then injects an instruction in Ukrainian or Arabic, hoping the filter doesn’t support that language. Protection against prompt injection must work regardless of input language, meaning either detection at the structural pattern level (attempt to change role, system instruction) or multilingual detection patterns.
Per-language topic scope: If the guardrail "don’t answer questions outside customer service scope" is defined by a list of Polish words, it won’t work for Hungarian queries. Scope should be defined at the semantic level: if the cosine similarity of the query to the topic space is below the threshold, the system refuses regardless of language.
Confidence threshold and language: Models have lower response confidence for languages with less coverage in training data. A fixed confidence threshold that works well for English may be too strict for Polish or too lenient for Romanian. Thresholds should be calibrated per language or per language family.
Human-handoff in the customer’s language: When the assistant escalates an issue to a human, the escalation message must be in the customer’s language, not the system’s language. "I’m transferring you to a consultant" instead of technical logs in English.
Patterns for building guardrails in complex systems are discussed in the article AI agent security.
GDPR and personal data in a multilingual context
#Multilingualism doesn’t change GDPR requirements but adds layers of complexity. Customers from different EU countries are subject to the same regulation, but local supervisory authorities (CNIL in France, BfDI in Germany, UODO in Poland) may have different detailed guidelines.
Four technical requirements regardless of language:
- PII masking before embedding: Names, email addresses, phone numbers, order numbers with personal data are masked locally before the query reaches the embedding layer or an external LLM. BGE-M3 running locally doesn’t require this step, but most production systems combine local and cloud models.
- Data residency: If processing data from German customers, ensure data doesn’t leave EU servers. Many cloud providers offer EU regions, but self-hosting local models provides certainty without analyzing contracts.
- Per-language conversation logs: Logs for GDPR audit purposes should include the session language. For a request for access or data deletion from a Francophone customer, the response should be in French, and the scope of deleted data must be clear.
- Consent for multilingual personalization: If the assistant uses conversation history for response personalization, consent for data processing for this purpose must be given and stored per user, regardless of the interface language.
For systems serving customers in healthcare, finance, or children, a DPIA should be conducted before deployment. Legal obligations in 2026 are discussed in the article AI Act and GDPR 2026.
Language quality: how to measure and what to improve
#A multilingual assistant tends to hide quality issues. The system may appear correct for languages someone on the team understands, but for languages no one knows natively, errors can be systematic and go unnoticed for weeks.
Three metrics worth tracking per language:
- Escalation rate per language: What percentage of conversations in a given language end with a transfer to a human. A high rate for a specific language signals low response quality or poor knowledge base coverage.
- User rating per language: A simple post-conversation survey (thumbs up/down or 1-5 scale). Comparing ratings for English vs. Polish vs. Romanian reveals quality disparities.
- Latency per language: Multilingual models may have different generation times for different languages due to tokenization. Languages with rich morphology (Polish, Czech, Hungarian) generate more tokens from the same text, which can affect latency.
Good agent quality monitoring should break down metrics per language from day one of production, not after complaints arise.
Cost and ROI
#The cost of a multilingual assistant depends on the number of languages, conversation volume, and chosen architecture (local vs. cloud API). A pilot with one additional language beyond the base typically takes 2-4 weeks and includes guardrail expansion, response quality testing, and threshold calibration. Each subsequent language with a stabilized architecture requires less work since the infrastructure is already in place.
The ROI calculator lets you input real query volumes per language, hourly rates, and current handling time to see payback time without "guesstimating." For companies with 15%+ queries in non-base languages, a multilingual assistant usually pays off faster than hiring a separate agent or manually translating each query.
Estimated deployment scopes:
| Scope | Languages | Condition | Pilot time |
|---|---|---|---|
| Bilingual assistant (PL + EN) | 2 | Knowledge base in one language, RAG index ready | 2-3 weeks |
| Expansion to 4-6 EU languages | 4-6 | Quality verification per language, guardrail testing | 4-8 weeks |
| Full multilingualism (10+ languages) | 10+ | Model router, per-language monitoring, DPIA | 2-4 months |
Pilots always start with the language having the highest query volume beyond the base, not the most technically challenging.
Try it live
#Describe your current language scope and type of customer queries, and the model will suggest an architecture and guardrails tailored to your case (playground: PII masked, zero retention):
FAQ
#Does a multilingual AI assistant require separate knowledge bases per language?
#No, when using multilingual embedding models like BGE-M3, a single knowledge base in the base language suffices. The model maps queries from different languages into a common vector space, so semantic search works correctly regardless of the query language. Translating the base content into each language is optional and justified only when documents contain idiomatic expressions hard to translate via embedding or when the base contains highly specialized content in a given language.
How does an AI assistant handle Central European languages like Polish or Czech?
#Fusional languages like Polish or Czech are more challenging for models due to rich morphology and less coverage in training data compared to English. In practice, this means a higher risk of grammatical errors in responses and lower retrieval confidence for queries with rare word inflections. The production pattern is calibrating confidence thresholds separately for these languages and a higher escalation rate to humans initially, which decreases as the knowledge base expands. Models like Bielik (Polish) or multilingual Mistral and Qwen with good Central European coverage are better choices than models optimized solely for English.
What to do when the assistant responds in the wrong language?
#The first cause is language detection error on very short queries. Solution: with low detection confidence, the assistant asks for language preference or uses the language defined by the user profile or session location. The second cause is a model that can’t maintain language throughout the conversation. Solution: the language detected at the start of the session is sent as a system instruction to the LLM in every call. The third cause is lack of context: if retrieved fragments are exclusively in one language, the model may follow that language instead of the query language. The system instruction should explicitly require a response in the user’s language regardless of source language.
Is a multilingual assistant subject to the AI Act?
#The AI Act doesn’t impose special requirements solely due to multilingualism. Requirements depend on the risk of the application: a customer service assistant in e-commerce is usually a low or limited-risk system, primarily requiring transparency (the customer must know they’re talking to AI) and the option to escalate to a human. Systems assessing creditworthiness, recruiting, or making decisions about access to basic services are classified as high-risk regardless of language. A detailed review of obligations is in the article AI Act and GDPR 2026. If your assistant serves customers from multiple EU countries and processes sensitive data, it’s worth conducting a DPIA before production deployment.
Where to start implementing a multilingual assistant?
#Start by measuring query volume per language over the last 3 months. If one language accounts for more than 15% of queries, there’s a business case for a pilot. The next step is assessing the quality of the current knowledge base as a candidate for the RAG index. Fragmented, inconsistent, or outdated content in the base language will yield poor results in all languages. How to prepare company data for AI covers this stage in detail. The full implementation methodology from audit to pilot is in the article where to start AI implementation.