Every company wanting to "implement AI" faces the same fork in the road: you can spend three months on strategy, workshops, and tenders, or launch a working system on one process within 30 days and start collecting real data. The second approach is harder because it requires decisions—but it delivers results you can measure, not just describe.
Below, you’ll find a concrete plan, week by week. This isn’t a sales pitch—it’s a sequence of steps that actually reduce risk and shorten the path to your first measurable ROI.
Week 1: Process audit and scope selection
#The first week isn’t about writing documents—it’s about conversations and numbers. Three questions you must answer by the end of the week:
Which process consumes the most hours and is repetitive? Ticket categorization, invoice approval, answering customer FAQs, extracting data from documents—these are typical candidates. Check the process identification tool to gather data without guessing.
Do you have input data? RAG works on existing knowledge (FAQs, regulations, ticket history). A classifier needs labeled examples. You don’t need everything—but a narrow slice of data for the first process must exist.
Who will own the system post-implementation? Implementations without a defined owner on the company side end up as dead systems after a quarter. Assign one person.
By the end of the week, you should have: one selected process, an estimated number of monthly hours (a number, not “a lot”), a list of existing data, and the owner’s name. Use the readiness assessment to check if the organization is ready for implementation without infrastructure blockers.
Week 2: Data preparation and architecture design
#Data is rarely ready. It doesn’t need to be perfect—just sufficient for the pilot.
Practical rule: If you chose FAQ handling, you need at least 50–100 question-answer pairs covering 80% of actual queries. If you chose classification, you need a few hundred labeled examples. If you chose data extraction from documents, you need representative document samples—not necessarily thousands.
This same week, you decide on the architecture. The most common choice for the first implementation:
| Use case | Architecture | Implementation time |
|---|---|---|
| FAQ and customer query handling | RAG + guardrails | 1–2 weeks |
| Ticket categorization / approval | Classifier + structured output | 1–2 weeks |
| Field extraction from documents | OCR + data extraction | 2–4 weeks |
| Multi-step automation | Agent + human-gate | 3–6 weeks |
Simple selection rule: If the system only needs to respond (not perform actions), RAG is enough. If it needs to do something (save, modify, send), it requires an agent with human-handoff for irreversible actions.
Choose your tech stack based on data characteristics and data residency requirements—some clients require data to stay on Polish servers. In such cases, self-hosting models is a deployment requirement, not an option.
Week 3: Build and initial tests
#Week three is for building. Goal: a working system on a test environment that you can show to the process owner and gather feedback.
A few rules that distinguish good implementations from bad ones:
PII masked before cloud models. If data contains names, customer numbers, or addresses—it must be anonymized before being sent to an LLM. This isn’t optional; it’s a legal requirement under RODO. Skipping this step can kill the project—and rightly so.
Guardrails from day one. There’s no point testing a system without guardrails because results won’t reflect production. Minimum: topic scope, confidence threshold (below threshold → escalation to human), and instruction injection blocking.
Observability built-in, not added later. Every model call should log the (anonymized) query, response, latency, and whether it was escalated. Without logs, you don’t know what works and what doesn’t.
In practice: A system ready by the end of week 3 handles 60–70% of test cases correctly. The rest goes to escalation. That’s a good result for a pilot—you’re not looking for perfection, just hypothesis validation.
Week 4: Production, measurement, and scaling decision
#Week four is about going live with limited traffic and collecting the first real data.
Deployment model: Start with 10–20% of traffic or one user group. The rest follows the old (manual) path. After a week, you have a comparison: how many cases the system closed without human intervention, how many it escalated, what the handling time was, and whether errors occurred.
Measurable results after 30 days of pilot:
| Metric | How to measure | Acceptable threshold |
|---|---|---|
| % of cases handled automatically | Number closed by AI / total | minimum 40–60% for FAQ |
| Handling time (AI vs manual) | Median time to close | AI should be at least 50% faster |
| Errors requiring correction | Number of escalations due to AI error | below 5% of all cases |
| Cost per case | Infrastructure cost / number of cases | comparable to or lower than manual cost |
If all four metrics are within acceptable ranges, you have a basis for scaling discussions. If not—the diagnosis is built into the metrics: too many escalations point to data gaps, too many errors point to guardrail issues.
Calculate ROI with the ROI calculator—it’s deterministic math, not guesswork.
What to do when the pilot underdelivers
#Implementations don’t always succeed on the first try, and that’s normal. Typical causes and fixes:
Insufficient data scope. If the system escalates 70% of cases, the knowledge base is incomplete. Remedy: two weeks of data enrichment and retesting—don’t abandon the project.
Too broad a scope for the first process. Instead of “customer service automation,” take “answering questions about delivery status.” Narrower scope = higher success rate = faster ROI.
Missing guardrails. If the model answers out-of-scope questions or hallucinates numbers, guardrails are misconfigured. More in the article on limiting AI hallucinations.
Integration with source systems fails. The agent can’t read CRM, ERP, or knowledge base in real time. This is an infrastructure problem, not an AI one—solved by integration via n8n or direct API.
None of these reasons are cause for abandonment. Each is a diagnosis with a concrete remedy. Implementation problems are rarely mysterious—they’re just undiagnosed.
Security and compliance: What you must have before production launch
#Before the system goes live, three issues must be resolved—not “in the plans,” but actually ready:
RODO and data processing. If the system processes customer personal data, you need an information clause, a legal basis for processing, and a data processing agreement with the infrastructure provider. Details in the AI Act and RODO 2026 guide.
AI Act—risk classification. AI systems in high-risk areas (recruitment, credit scoring, health) are subject to additional obligations: DPIA, human-oversight, and system registration. Check the classification before implementation, not after.
Transparency. If the system communicates with customers, they must know they’re talking to AI. This is a requirement under AI Act Art. 50, effective from February 2, 2025. Implementation is simple—one sentence in the first message—but omitting it is a violation.
More on agent security architecture in the article on AI system security.
How to assess readiness before launch
#Before implementation, check three areas:
Data: Do you have a knowledge source that can be indexed? Documents, FAQs, ticket history, price lists—anything the agent should know. No data = no context = hallucinations.
Infrastructure: Is the API to source systems (CRM, ERP, knowledge base) available? Even a simple CSV export works for a pilot, but live access is needed for production.
Organization: Is there a designated system owner who will manage knowledge updates and handle escalations? AI systems require maintenance like any other software.
Use the AI readiness assessment—a 10-minute tool that asks about these three areas and gives a concrete score instead of a vague answer.
Try it live
#Describe your process below, and the model will break it down into pilot stages and indicate which steps can be automated in the first 30 days (playground: PII masked, zero retention):
FAQ
#How long does the first AI implementation take?
#A pilot on one narrow process typically takes 2–4 weeks from data collection to a working test environment system. Full production deployment with system integration and security testing—depending on complexity—takes 4 to 8 weeks. We don’t provide fixed timelines because the scope varies significantly between companies.
Do I need a lot of data to start?
#No. RAG for customer questions starts with a few dozen FAQ pairs. A classifier needs a few hundred labeled examples. For a pilot, a narrow slice of data from one process is enough—not the entire company database. You’ll iteratively enrich data after each test cycle.
How much does AI implementation cost in the first 30 days?
#Cost depends on scope and architecture. A simple RAG pilot for FAQs has a different budget than an agent integrating with CRM and ERP. Calculate your case with the ROI calculator or schedule a call via the contact form—we provide ranges after understanding the specific process, not a price list with starting rates.
Does an AI system have to inform customers it’s a bot?
#Yes. From February 2, 2025, AI Act Art. 50 requires any system interacting with people to inform them at the start of the conversation. This applies to systems deployed in the EU regardless of whether the company is based in Poland. Implementation is technically simple—one line in the first message.
What if the pilot doesn’t deliver expected results?
#A failed pilot is a diagnosis, not a failure. The most common causes are: too narrow a data scope (remedy: enrich the knowledge base), too broad a process scope (remedy: narrow to a specific use case), or missing guardrails (remedy: configure escalation thresholds). Each of these has a concrete solution—we’ll discuss them in a post-pilot conversation.