AI implementation plan step by step: first 30 days

Every company wanting to "implement AI" faces the same fork in the road: you can spend three months on strategy, workshops, and tenders, or launch a working system on one process within 30 days and start collecting real data. The second approach is harder because it requires decisions—but it delivers results you can measure, not just describe.

Below, you’ll find a concrete plan, week by week. This isn’t a sales pitch—it’s a sequence of steps that actually reduce risk and shorten the path to your first measurable ROI.

Week 1: Process audit and scope selection#

The first week isn’t about writing documents—it’s about conversations and numbers. Three questions you must answer by the end of the week:

Which process consumes the most hours and is repetitive? Ticket categorization, invoice approval, answering customer FAQs, extracting data from documents—these are typical candidates. Check the process identification tool to gather data without guessing.

Do you have input data? RAG works on existing knowledge (FAQs, regulations, ticket history). A classifier needs labeled examples. You don’t need everything—but a narrow slice of data for the first process must exist.

Who will own the system post-implementation? Implementations without a defined owner on the company side end up as dead systems after a quarter. Assign one person.

By the end of the week, you should have: one selected process, an estimated number of monthly hours (a number, not “a lot”), a list of existing data, and the owner’s name. Use the readiness assessment to check if the organization is ready for implementation without infrastructure blockers.

Week 2: Data preparation and architecture design#

This plan assumes a tailor-made build—RAG, a classifier, or an agent on your data. If an off-the-shelf product (ChatGPT, Copilot, an industry SaaS) covers the process 1:1 and you need neither integration nor control over your data, buy it instead of building—compare the decision in the own assistant vs off-the-shelf decision tree or in the own vs ready-made assistant analysis.

Data is rarely ready. It doesn’t need to be perfect—just sufficient for the pilot.

Practical rule: If you chose FAQ handling, you need at least 50–100 question-answer pairs covering 80% of actual queries. If you chose classification, you need a few hundred labeled examples. If you chose data extraction from documents, you need representative document samples—not necessarily thousands.

This same week, you decide on the architecture. The most common choice for the first implementation:

Use case	Architecture	Implementation time
FAQ and customer query handling	RAG + guardrails	1–2 weeks
Ticket categorization / approval	Classifier + structured output	1–2 weeks
Field extraction from documents	OCR + data extraction	2–4 weeks
Multi-step automation	Agent + human-gate	3–6 weeks

Simple selection rule: If the system only needs to respond (not perform actions), RAG is enough. If it needs to do something (save, modify, send), it requires an agent with human-handoff for irreversible actions.

Choose your tech stack based on data characteristics and data residency requirements—some clients require data to stay on Polish servers. In such cases, self-hosting models is a deployment requirement, not an option.

Week 3: Build and initial tests#

This four-week sequence assumes the simplest architecture from the table above—RAG-FAQ or a classifier (1–2 weeks to build). For document extraction (2–4 weeks) or a multi-step agent (3–6 weeks), Weeks 3–4 are the start of the build and a pilot on a test environment, while going live on limited production traffic shifts accordingly by 1–4 weeks—in line with the ranges in the table and the FAQ.

Week three is for building. Goal: a working system on a test environment that you can show to the process owner and gather feedback.

A few rules that distinguish good implementations from bad ones:

PII masked before cloud models. If data contains names, customer numbers, or addresses—it must be anonymized before being sent to an LLM. This isn’t optional; it’s a legal requirement under GDPR. Skipping this step can kill the project—and rightly so.

Guardrails from day one. There’s no point testing a system without guardrails because results won’t reflect production. Minimum: topic scope, confidence threshold (below threshold → escalation to human), and instruction injection blocking.

Observability built-in, not added later. Every model call should log the (anonymized) query, response, latency, and whether it was escalated. Without logs, you don’t know what works and what doesn’t.

In practice: A system ready by the end of week 3 answers 60–70% of the prepared test set correctly (controlled, representative cases). The rest goes to escalation. That’s a good result at this stage—but remember that on live traffic the share of cases closed fully automatically will be lower than on the test set, because real queries are broader and noisier. You’re not looking for perfection, just hypothesis validation.

Week 4: Production, measurement, and scaling decision#

Week four is about going live with limited traffic and collecting the first real data.

Deployment model: Start with 10–20% of traffic or one user group. The rest follows the old (manual) path. After a week, you have a comparison: how many cases the system closed without human intervention, how many it escalated, what the handling time was, and whether errors occurred.

Measurable results after 30 days of pilot:

Metric	How to measure	Acceptable threshold
% of cases closed without a human (production)	Number closed by AI / total	minimum 40–60% for FAQ
Handling time (AI vs manual)	Median time to close	AI should be at least 50% faster
Errors requiring correction	Number of escalations due to AI error	below 5% of all cases
Cost per case	(inference cost + human handling of escalated cases + amortized maintenance) / number of cases	comparable to or lower than the fully loaded cost of manual work

Count the full cost per case, not just infrastructure: with 30–40% escalation, the savings come from the automated portion, while into the TCO you add the people handling escalations and maintaining the knowledge base (see TCO). Only compare that fully loaded figure against the fully loaded cost of manual work.

If all four metrics are within acceptable ranges, you have a basis for scaling discussions. If not—the diagnosis is built into the metrics: too many escalations point to data gaps, too many errors point to guardrail issues.

Calculate ROI with the ROI calculator—the formula is transparent and repeatable, and the result is only as good as your inputs (number of hours, hourly rate, the realistic % to automate).

What comes after 30 days: the maintenance rhythm#

The pilot isn’t the end—but it isn’t the start of an "eternal cost" either. Once stabilized, maintenance follows a predictable rhythm:

Who: the system owner designated in Week 1 handles day-to-day maintenance, not the project team.
Knowledge updates: the RAG knowledge base or classifier examples are refreshed on a cycle (e.g. monthly, or after any material change to the process or offering)—a few hours of work, not a new project.
Quality monitoring (drift): you keep tracking the same metrics from Week 4 (% automated, % errors, % escalations) from the observability logs; a rising escalation rate is a signal that the data has gone stale.
Guardrail re-testing: after every model or prompt change, rerun the safety test battery (injection, topic scope, confidence threshold) before the change reaches production.
Cost: once stabilized, the dominant cost is the infrastructure and model cost per case (calculable in the ROI calculator) plus the owner’s predictable effort—not a rising "eternal cost," as long as the scope stays narrow.

What to do when the pilot underdelivers#

Implementations don’t always succeed on the first try, and that’s normal. Typical causes and fixes:

Insufficient data scope. If the system escalates ~70% or more of FAQ cases—clearly above the acceptable threshold—the knowledge base is incomplete. Remedy: two weeks of data enrichment and retesting—don’t abandon the project.

Too broad a scope for the first process. Instead of “customer service automation,” take “answering questions about delivery status.” Narrower scope = higher success rate = faster ROI.

Missing guardrails. If the model answers out-of-scope questions or hallucinates numbers, guardrails are misconfigured. More in the article on limiting AI hallucinations.

Integration with source systems fails. The agent can’t read CRM, ERP, or knowledge base in real time. This is an infrastructure problem, not an AI one—solved by integration via n8n or direct API.

None of these reasons are cause for abandonment. Each is a diagnosis with a concrete remedy. Implementation problems are rarely mysterious—they’re just undiagnosed.

Security and compliance: What you must have before production launch#

Before the system goes live, three issues must be resolved—not “in the plans,” but actually ready:

GDPR and data processing. If the system processes customer personal data, you need an information clause, a legal basis for processing, and a data processing agreement with the infrastructure provider. Details in the AI Act and GDPR 2026 guide.

AI Act—risk classification. AI systems in high-risk areas (recruitment, credit scoring, health) are subject to additional obligations: DPIA, human-oversight, and system registration. Check the classification before implementation, not after.

Transparency. If the system communicates with customers, they must know they’re talking to AI. This is a requirement under AI Act Art. 50, which starts to apply from August 2, 2026. It’s worth implementing from the start—the implementation is simple (one sentence in the first message), and omitting it after that date is a violation.

More on agent security architecture in the article on AI system security.

How to assess readiness before launch#

Before implementation, check three areas:

Data: Do you have a knowledge source that can be indexed? Documents, FAQs, ticket history, price lists—anything the agent should know. No data = no context = hallucinations.

Infrastructure: Is the API to source systems (CRM, ERP, knowledge base) available? Even a simple CSV export works for a pilot, but live access is needed for production.

Organization: Is there a designated system owner who will manage knowledge updates and handle escalations? AI systems require maintenance like any other software.

Use the AI readiness assessment—a 10-minute tool that asks about these three areas and gives a concrete score instead of a vague answer.

Try it live#

Describe your process below, and the model will break it down into pilot stages and indicate which steps can be automated in the first 30 days (playground: PII masked, zero retention):

▶Plan your AI implementation for your processsandbox · reasoning

FAQ#

How long does the first AI implementation take?#

A pilot on one narrow process typically takes 2–4 weeks from data collection to a working test environment system. Full production deployment with system integration and security testing—depending on complexity—takes 4 to 8 weeks. We don’t provide fixed timelines because the scope varies significantly between companies.

Do I need a lot of data to start?#

No. RAG for customer questions starts with a few dozen FAQ pairs. A classifier needs a few hundred labeled examples. For a pilot, a narrow slice of data from one process is enough—not the entire company database. You’ll iteratively enrich data after each test cycle.

How much does AI implementation cost in the first 30 days?#

Cost depends on scope and architecture. A simple RAG pilot for FAQs has a different budget than an agent integrating with CRM and ERP. Calculate your case with the ROI calculator or schedule a call via the contact form—we provide ranges after understanding the specific process, not a price list with starting rates.

Does an AI system have to inform customers it’s a bot?#

Yes. AI Act Art. 50 requires any system interacting with people to inform them at the start of the conversation; this obligation applies from August 2, 2026 (on February 2, 2025 the earlier prohibitions under Art. 5 and the AI-literacy requirement under Art. 4 took effect). This applies to systems deployed in the EU regardless of whether the company is based in Poland. Implementation is technically simple—one line in the first message.

What if the pilot doesn’t deliver expected results?#

A failed pilot is a diagnosis, not a failure. The most common causes are: too narrow a data scope (remedy: enrich the knowledge base), too broad a process scope (remedy: narrow to a specific use case), or missing guardrails (remedy: configure escalation thresholds). Each of these has a concrete solution—we’ll discuss them in a post-pilot conversation.