A customer visits a B2B store website, browses electrical equipment, and leaves without making a purchase. The next day, they receive an email with exactly the product they viewed, supplemented with two accessories that other buyers added to the same order. The conversion rate is 18%. Without that email, it would have been 3%. This difference doesn’t come from a salesperson’s intuition—it comes from a recommendation model built on session data and order history.
AI personalization isn’t exclusive to large e-commerce platforms. Companies with a few hundred B2B customers, service portals, and specialty stores are now implementing systems that three years ago required a dedicated data science team. Below, I describe how it works from an architectural perspective, where the pitfalls are, and what it actually costs.
Two different problems: recommendations and personalization
#The terms are often used interchangeably, but they describe different mechanisms.
A recommendation engine answers the question: "What else might interest this customer?" Input data includes interaction history (clicks, purchases, time on page) and product features. The output is a ranking of products or content sorted in descending order by predicted relevance for a specific person or segment.
Offer personalization answers the question: "How to tailor the message, price range, or step sequence to the customer profile?" Input data includes the customer segment (industry, company size, purchase funnel stage), contact history, and session context. The output is a modified page layout, section prioritization, email content, or a value proposition tailored to the person’s role.
Both systems can work together: a RAG agent with access to the product catalog personalizes conversation content while simultaneously pulling the recommendation engine to suggest specific products. I describe such a duo in the architecture section.
Three architectures: from simplest to full agent
#There’s a spectrum of solutions differing in cost, capabilities, and maintenance complexity.
| Architecture | How it works | When it’s sufficient | Limitations |
|---|---|---|---|
| Static rules | Manually defined segments, if-then matching | Few products, stable catalog, up to 500 customers | Doesn’t scale, requires manual maintenance |
| Collaborative filtering | User-product similarity matrix (ALS, SVD) | E-commerce with purchase history, thousands of SKUs | Cold start (new customers, new products), no session context |
| Embeddings + semantic search | Products and queries as vectors in BGE-M3 space, hybrid search | Catalogs with descriptions, text search, B2B | Requires vector index, lacks behavioral signals |
| AI Agent with memory | LLM with tool-use, RAG on catalog, customer history in context | Complex configurations, B2B with consultation, bespoke offers | Higher token costs, latency, requires guardrails |
For most Polish B2B companies starting in 2026, the optimal entry point is embeddings with semantic search supplemented by simple collaborative filtering on purchase history. An AI agent with complex memory comes into play when the process requires explaining recommendations to the customer or configuring products in real time.
Data: what to collect and how not to violate GDPR
#Every recommendation engine is only as good as the data it stands on. At the same time, behavioral data is an area where GDPR and the AI Act have specific requirements.
Explicit data (explicit feedback) includes ratings, "like" clicks, wish lists. The customer consciously expresses a preference. The legal basis is a contract or legitimate interest, depending on the context.
Implicit data (implicit feedback) includes time on page, scroll depth, abandoned carts. Here, you need marketing consent or a clearly documented legitimate interest in the processing activities register. Collecting implicit data without a legal basis isn’t just a GDPR risk—it’s material for an AI Act audit if the system makes decisions affecting price or offer access.
Transactional data (order history) has the strongest legal basis (contract performance) and is the most valuable for collaborative filtering. Remember to anonymize or pseudonymize before the data reaches the model layer, especially if the model runs in the cloud.
PII masking before sending to LLM is mandatory. A customer’s name, email address, or tax ID shouldn’t end up in the prompt generating recommendations. The model only needs a session identifier, segment features, and interaction history. Details on masking are described in the article PII anonymization before AI.
Cold start: what to do when you have no history
#Cold start is a situation where a new customer, new product, or new company starts using the system without any history. Collaborative filtering doesn’t work here. Three approaches that work in practice:
Fallback to segment popularity. A new customer from the construction industry receives recommendations based on what other customers in the construction industry of similar company size have purchased. It’s not individually personalized, but it’s more accurate than a general bestseller list.
Onboarding with questions. A few questions at the start (industry, company size, what you plan to solve) build a starting profile without history. The system treats the answers as explicit preference data and immediately narrows the recommendation space.
Content-based filtering on embeddings. If a customer asks about a specific product or enters a phrase, the system looks for products with similar semantic meaning. It doesn’t need history because it relies on description similarity. This approach works from the first session and naturally integrates with semantic search in the catalog.
Guardrails for recommendations: which features are prohibited
#A recommendation engine can learn to discriminate if historical data reflects unequal customer treatment. This isn’t a theoretical scenario.
If historical data shows that customers from certain regions rarely received premium offers (because salespeople worked that way), the model will learn and perpetuate that pattern. The AI Act classifies systems that evaluate or differentiate access to products and services as potentially high-risk, especially when using a classifier on demographic features.
Guardrails for recommendation engines include four layers:
- Denylist of protected features. The model cannot use gender, age, nationality, religion, or similar attributes as ranking signals. The list is hardcoded in the configuration and doesn’t depend on the model’s discretion.
- Equality audit of outputs. Every month, check if different demographic segments receive comparable recommendation quality and access to similar offers.
- Explainability on demand. A customer or inspector can ask: "Why was this product recommended to me?" The system must respond with readable premises, not just a similarity vector.
- Human-gate for prices. If personalization affects price (e.g., offer ranges tailored to a segment), every price change in the engine must be approved by a human with a sales role.
AI Act and GDPR: what recommendations mean for compliance
#The AI Act (fully applicable from 2025) categorizes recommendation and personalization systems depending on context. Most e-commerce and B2B systems aren’t automatically high-risk. Exceptions:
- Personalization in finance (credit scoring, access to financial products) — AI Act Annex III, high risk.
- Recruitment recommendations or employee evaluation — Annex III, high risk.
- Systems influencing the behavior of large user groups with addiction mechanisms (social media, video platforms) — subject to Art. 5 ban on manipulative practices.
For standard B2B e-commerce and service portals, recommendations fall outside the high-risk category but require: disclosure that recommendations are automatic (AI Act Art. 50 transparency), process documentation (DPIA if sensitive data is involved), and the right to opt out of profiling (GDPR Art. 22 for fully automated decisions).
Detailed company obligations for 2026 are described in the article AI Act and GDPR 2026.
Measuring effectiveness: KPIs that matter
#Personalization without measurement is an expensive experiment without conclusions. Three metrics worth tracking from the first production day:
Click-through rate (CTR) of recommendations — what percentage of displayed recommendations lead to a click. The benchmark is CTR before personalization or CTR of a control group (A/B test). A 20-40% CTR increase after implementing embeddings is typical for the first phase.
Revenue uplift per session — the difference in basket value between sessions with active personalization and sessions without (or with a control group). This is a number for management. With a proper A/B test and control group, an 8-15% uplift per session is a realistic goal after 6-8 weeks.
Coverage — the percentage of products in the catalog that appear in recommendations for at least one user within a month. Low coverage (below 30%) signals popularity bias: the model recommends the same bestsellers to everyone, ignoring the long tail of the catalog.
Monitoring a recommendation engine has a similar structure to monitoring an AI agent described in the article on AI agent KPI monitoring — four layers, golden set, and alerts for drift.
Try it live
#Describe your product or service catalog and current customer segmentation method, and the model will indicate which architecture to start with and what data is key (playground: PII masked, zero retention):
FAQ
#Is AI personalization suitable for a small B2B company?
#Yes, but the entry point should be proportional to the scale. For a company with a few hundred customers and a stable catalog, semantic search on embeddings and simple logic like "Customers in your industry also bought..." is often sufficient. Full collaborative filtering with a ranking model makes sense from several thousand customers or hundreds of thousands of transactions. Assess your starting point with the ROI calculator before committing a budget to the AI layer.
How long does it take to implement a recommendation engine?
#It depends on the quality of the initial data and the chosen architecture. A pilot based on embeddings and semantic search on an existing catalog: usually 3-5 weeks from data audit to the first production version. A full system with collaborative filtering, A/B testing, and guardrails: 8-14 weeks. Timelines most often shift due to source data quality (missing product descriptions, inconsistent categories), not because of the models. A step-by-step action plan is described in the article AI implementation plan.
What customer data do I need for personalization?
#The most valuable are transaction history (what was purchased, when, in what combination) and behavioral data with consent (clicks, time on page, searches). You can start with just order history and product features, without any behavioral data. Collaborative filtering on transactional data yields surprisingly good results with a relatively sparse input dataset. Before sending anything to the model, plan PII masking according to anonymization guidelines.
How to avoid promoting bestsellers at the expense of the rest of the catalog?
#Popularity bias is the most common problem in early implementations. Three fixes: (1) cap the frequency of bestsellers in recommendations per session (e.g., max 1 from the top-10 popularity in a set of 5 recommendations); (2) diversity penalty in the ranking function promoting products from different categories; (3) exploration quota reserving one spot in the set for a product the customer has never viewed but is semantically similar. Coverage as a monthly metric will automatically reveal whether these mechanisms work.
Can an AI agent replace a classic recommendation engine?
#Partially. An LLM agent with access to the catalog via RAG handles explaining recommendations and configuring products in real time during a conversation well. It struggles with processing hundreds of thousands of behavioral signals needed for collaborative filtering. The optimal architecture combines both: a classic ranking model generates candidates, and the LLM agent selects from them and formulates a customer-readable justification. The article AI agent vs chatbot describes the boundary of both approaches' capabilities.