Over the past two years, the pattern repeated regularly: a company invests several months and budget into an AI project, launches a pilot, and after eight weeks, stagnation sets in. The model “works,” but business results don’t improve. The team doesn’t know what to fix. The project ends up in a drawer.
The cause was almost never the model. It was the process around it.
Mistake #1: No measurable goal before starting
#AI projects that begin with “let’s see what we can do with AI” have one inherent problem: there’s no success criterion. Without it, every model demonstration looks good, and every error is “something to refine.”
A measurable goal is a concrete statement: “handling time per request drops from 8 to 3 minutes in 80% of cases” or “the classifier routes 70% of queries to the correct queue without human intervention.” Such a statement also defines when the project is ready to move from pilot to production.
Practical consequence: define the goal before selecting the model, not after. The model is chosen to fit the goal, not the other way around. If you can’t write the goal in one sentence with a number and a time horizon, the project is too vague to start. A tool for initial process readiness assessment is the automation finder.
Mistake #2: Data that doesn’t reflect reality
#A model is only as good as the data it works with. The most common scenario: a company prepares a knowledge base from documentation that hasn’t been updated in a year. Or trains the model on historical data that doesn’t include edge cases because those were handled manually outside the system.
Three symptoms of poor input data:
- The model answers questions from the documentation well but struggles with questions customers actually ask.
- Results on the test set are good, but on production data, hallucinations or misclassifications appear regularly.
- The model uses terminology no one in the company uses anymore because it comes from pre-reorganization documents.
Before building a RAG or fine-tuning, conduct a data audit: which documents are current, which are orphaned, which contain contradictions. The article on preparing company data for AI describes this audit step by step.
Mistake #3: Skipping guardrails and no handling of “I don’t know”
#A language model without guardrails is like an employee without a job description: it does everything it’s asked, including things outside its competence. In a corporate environment, this means answering questions the model shouldn’t answer or confabulating responses when knowledge isn’t in the database.
Two mechanisms that are mandatory in every production system:
“I don’t know” response with escalation. A model that can’t find a sufficiently confident answer in the knowledge base shouldn’t guess. It should say directly: “I don’t have reliable information on this topic” and suggest contacting a human. Designing this path is covered in the article on monitoring and quality of an AI agent.
Thematic guardrails. A system operating in e-commerce customer service doesn’t answer legal questions, doesn’t provide specific insurance prices, and doesn’t diagnose medical issues. Guardrails are defined as a list of allowed intents or blocked query categories. Every attempt to go beyond the scope is logged and escalated.
The lack of these mechanisms results not only in poor response quality but also legal liability for content the system produces.
Mistake #4: Ignoring data security and GDPR
#AI projects regularly fall into a trap: data is connected, the model responds, but no one checks what exactly goes into the model and where it’s processed. Particularly critical scenarios:
- Customers’ personal data (names, order numbers, addresses) end up in prompts sent to external APIs without pseudonymization.
- Conversation logs contain PII in plain form and are stored without a legal basis.
- The company uses a cloud API for the model, but the provider agreement doesn’t meet the requirements of GDPR Article 28 (data processor).
Minimum security requirements for any AI project processing personal data:
- Masking PII before feeding it to the model (in the router, not the client application).
- A data processing agreement with the model provider or self-hosting in your own infrastructure.
- Retention of conversation data limited to the minimum necessary purpose, with automatic deletion.
- A path for exercising the right to erasure (GDPR Article 17), especially important for embeddings and vector databases.
For high-risk processes (health, financial, HR data), a DPIA is required before launch. Details on company obligations in 2026 are covered in the article AI Act and GDPR 2026.
Mistake #5: No human-gate for high-risk actions
#AI systems that take actions on behalf of the company (sending emails, updating records, approving transactions) require an approval mechanism for irreversible steps. A project without this will sooner or later send a message to the wrong recipient, overwrite an important record, or approve an action based on incorrect classification.
Human-gate isn’t about disabling automation entirely. It’s about stopping the agent before an irreversible step and waiting for explicit operator approval. Approval is logged: who, when, in what task context. In practice, the five-question model works before every high-risk action:
- Does the action change the system state in a way that’s hard to reverse?
- Does an error in this action have direct consequences for the customer or financial impact?
- Did the model have access to all necessary data for this decision?
- Is the result of the previous step certain (not estimated)?
- Has this action been performed in a similar context successfully before?
A “no” to any of these questions is a signal to stop and escalate. The detailed architecture of this mechanism is described in the article on the role of humans in the agent loop.
Mistake #6: No monitoring after launch
#An AI project launches, works well for two weeks, then quality gradually declines. No one notices because there are no metrics. The model still responds, but answers become less accurate, and more customers are routed to humans—not because of difficult questions, but because of system errors.
Quality drift is a systematic phenomenon: company data changes (new products, updated procedures, new regulations), but the model’s knowledge base doesn’t keep up. A system that doesn’t monitor its own quality has no way to detect when it becomes a problem.
Minimum monitoring for a production system:
| Metric | What it measures | Alarm threshold |
|---|---|---|
| Escalation rate to human | Percentage of queries the agent couldn’t handle independently | Increase of >5 pp week-over-week |
| “I don’t know” rate | Percentage of responses without a confident source | Increase of >3 pp |
| p95 response time | Latency for 95% of queries | Exceeding set SLA |
| Quality score (golden set) | Comparison with a reference question set weekly | Accuracy drop of >5 pp |
| Tool error rate (for agents) | Percentage of tool calls with errors | Increase of >2 pp |
The architecture of complete monitoring is described in the article on monitoring and KPIs for AI agents. Evaluating RAG response quality using the golden set method is covered in the article on RAG evaluation.
Mistake #7: Choosing the wrong first process
#Not every process is suitable for the first AI project. The most common mistake: a company chooses either a process that’s too trivial (FAQ handling that took 10 minutes a day) or too complex (complaint handling requiring expert assessment and negotiation). The first doesn’t return the investment. The second fails because the model can’t reliably replace an expert.
Characteristics of a process suitable for a first AI project:
- Repeatability: at least 50 similar cases per month.
- Definability: each step can be described by a rule or decision scheme.
- Verifiability: the result can be checked programmatically or through simple control.
- Limited decision scope: doesn’t require contextual knowledge outside available data.
- Not a high-risk process under AI Act Annex III (or the company is ready for full compliance).
Before choosing a process, it’s worth going through the automation finder, which assesses the process’s suitability for AI automation based on these criteria. The methodology for selecting the first process is also described in the article where to start with AI implementation.
What a successful AI project looks like: the shadow mode pattern
#Among projects that end up in production, the vast majority went through a shadow mode phase. The agent runs in parallel with a human for 2-4 weeks: processes the same data, generates the same responses, but the results aren’t applied. Instead, they’re compared with human decisions.
Shadow mode reveals gaps that no unit tests find: edge cases specific to the industry, terminology used by customers that doesn’t match the documentation, situations where data in the database is contradictory.
Only after shadow mode with discrepancies below a set threshold (typically 5-10% of decisions differing from human ones) does the system enter the pilot phase with human-gate. The full schedule and step-by-step implementation plan is described in the article AI implementation plan.
Try it live
#Describe the AI project you’re planning or one that’s stuck. The model will identify which of the seven mistakes best fits your case and suggest specific corrective steps. (playground: PII masked, zero retention):
FAQ
#Does an AI project always require large amounts of data before starting?
#No. RAG (retrieval-augmented generation) works well with a few hundred to a few thousand documents if they’re current and consistent. Fine-tuning requires large datasets, but most first-phase projects don’t need it. What matters is data quality and currency, not quantity. A company with 200 good documents will produce a better system than one with 20,000 documents, half of which are outdated or contradictory. A data audit before the project is described in the article on preparing company data for AI.
How long does a typical AI project pilot take?
#For one well-chosen process: 6-10 weeks from contract signing to the decision to move to production. Week 1-2: data and guardrails audit. Weeks 3-5: shadow mode. Weeks 6-8: pilot with human-gate and monitoring. Weeks 9-10: decision and potential adjustments. Projects shorten when the company already has prepared data and a defined goal. They lengthen with integration into multiple systems or lack of access to historical data. An estimated pilot cost is calculated with the ROI calculator.
What if the AI project must comply with AI Act requirements?
#Check if the process qualifies as a high-risk system under AI Act Annex III. This applies to areas like employment (candidate selection, employee evaluation), financial services (credit scoring), education, healthcare, and a few others. For such systems, the following are required: DPIA, technical documentation, system registration in the EU database, human oversight mechanism, and transparency toward users. A detailed description of obligations in 2026 is in the article AI Act and GDPR 2026.
How to check if your AI project is ready for production?
#Four control questions before moving from pilot to production: (1) Error rate on the golden set below the set threshold? (2) Guardrails tested on real edge cases, not just synthetic ones? (3) Escalation path to human working and monitored? (4) Data in the knowledge base current and consistent with the current offer or procedures? Only when all four points are green does production deployment make sense. The readiness assessment is supported by the readiness evaluation tool.
Can you fix an AI project that’s already failing?
#Yes, but it requires diagnosing the cause, not just adding more features. The most common fixes: refreshing the knowledge base (when the problem is data drift), adding a guardrails layer and an “I don’t know” path (when the problem is uncontrolled responses), implementing golden set monitoring (when the problem is lack of visibility). Projects stuck due to poor process selection require going back to the selection step and potentially piloting a different process. The first step is always diagnosis, not rewriting from scratch. Contact us via contact with a description of the situation. We’ll analyze it and propose a corrective path.