In logistics projects, we regularly see the same pattern: a company hears that “AI will optimize the entire supply chain,” but after discussion, it turns out that people spend the most time manually entering CMRs into the system and sorting emails from carriers. This isn’t a failure—it’s good news. These narrow, repetitive tasks AI handles fairly and predictably. More challenging areas, like seasonal demand forecasting, only deliver value with a clearly defined margin of error and a human making the final decision. Below, we break down four applications to their core: what works, with what accuracy, and where the model’s role ends.
Demand forecasting: where accuracy begins and ends
#Demand forecasting is the area where it’s easiest to overpromise. The model learns from sales history, seasonality, promotions, and calendars to predict demand at the SKU or product group level. For stable, fast-moving items with 18-24 months of clean history, real accuracy (measured as the percentage of hits within a tolerance range, e.g., ±15%) typically falls between 75-90%. For slow-moving items, new products, or assortments heavily dependent on a few large customers, it drops to 50-70%—no model can fix this because the signal is simply missing.
That’s why the forecast isn’t a decision but an input to one. The planner receives a number along with a confidence interval and a list of factors that most influenced the result. This is where human oversight comes in: the model doesn’t know about a competitor’s upcoming promotion or that a major customer is switching suppliers. A fair implementation shows uncertainty, not hiding it behind a single neat number.
| Item type | Real accuracy range (±15%) | AI’s role | Final decision |
|---|---|---|---|
| Fast-moving, stable SKUs | 75-90% | Full automated forecast | Planner’s acceptance (batch review) |
| Seasonal with 2+ years of history | 65-80% | Forecast + season flag | Planner adjusts for campaigns |
| New products (under 6 months of history) | 50-65% | Analogy to similar SKUs | Planner decides, AI suggests |
| Slow-moving / project-based | 45-65% | Trend signal only | Full human decision |
Routing and order prioritization
#The second area is streamlining the order flow: which to send first, which to combine into a single route, and which to flag as at risk of missing the deadline. Here, AI doesn’t generate routes from scratch (dedicated optimization solvers handle that)—it reads the order context and assigns priority and category. It works like a multi-label classifier: for each order, it determines urgency, load type, delay risk, and whether human attention is needed.
Signals the model reads: declared delivery window relative to current time, delay history on the route, warehouse status (whether goods are picked), customer tier from the system, days until SLA ends. Based on this, the agent directs the order to the appropriate queue—urgent to the on-duty dispatcher, standard to automated notification, unclear to manual review. The key principle we repeat in every implementation: the cost of misprioritizing an urgent order is much higher than the cost of a false alarm. We set the sensitivity threshold asymmetrically—better for a human to dismiss a few overzealous escalations than for the system to miss one truly urgent delivery. We break down this mechanism further in classification and request routing.
OCR and data extraction from transport documents
#This is usually the fastest ROI in logistics. Waybills, CMRs, carrier invoices, notices, and delivery confirmations arrive as scans, driver phone photos, or PDFs of varying quality. Someone has to manually enter numbers, dates, weights, amounts, and order numbers into the system. OCR reads text from images, and the data extraction layer converts raw text into structured fields that go directly into TMS or ERP.
Real accuracy depends on source quality. Clean, standardized documents (printed invoices, system PDFs) achieve 95-99% correctly read fields. Phone photos, handwritten notes, stamps, and crumpled paper drop to 80-92%—and this is where a confidence mechanism is needed: every low-confidence field goes to quick human verification instead of silently entering the system with an error. For invoice amounts and CMR weights, one wrong digit has real costs, so financial and quantitative fields are verified more strictly than descriptive ones. Best practice: the extractor never automatically overwrites system data for critical fields—it suggests a value, and a human approves it with one click.
Classification of requests and operational communication
#The fourth area is the operational inbox: emails from carriers, customers, and warehouses that someone manually reads and routes today. “Where’s my shipment,” “notice for tomorrow,” “damaged goods,” “complaint,” “delivery address change”—each category goes to a different team with different urgency. The classifier reads the content, detects language, assesses urgency and sentiment, then routes the message further, sometimes with a ready response based on the knowledge base.
Under the hood, a router often selects the model based on task difficulty—simple emails are classified by a cheap, fast model, while ambiguous ones go to a stronger one. This is the same approach we use in customer service automation and projects for service companies. Here too, a hard boundary applies: damage reports, complaints, and strongly negative sentiment cases always go to a human, and auto-replies are only sent for a narrow band of routine, verifiable questions (status, hours, procedure).
Where humans must remain in the loop
#Connecting these four areas, the pattern is clear. AI handles reading, classification, and decision preparation well. Humans stay where errors are costly or irreversible, and context goes beyond historical data. Specifically: approving forecasts for high-value items, final decisions on conflicting order priorities, verifying financial and weight fields from documents, and any case marked with low confidence or strong negative sentiment.
Implementation is phased: first, shadow mode, where the model suggests and humans decide and compare for 4-8 weeks. Only when metrics—accuracy, correction rate, handling time—are stable do we automate the most confident, lowest-risk categories, leaving the rest to humans. This approach is less flashy than “AI takes over the warehouse,” but it’s fair and delivers sustainable results.
FAQ
#What real accuracy can be achieved in demand forecasting?
#For stable, fast-moving items with 18-24 months of history, the real hit rate within a ±15% range is typically 75-90%. For new products and slow-moving items, it drops to 50-70% because the data signal is missing. That’s why we treat the forecast as input for the planner’s decision with an explicit confidence interval, not as a ready number for automatic ordering.
Does AI set delivery routes on its own?
#Not in the sense of full optimization—dedicated solvers and TMS systems handle route calculation. AI in this context reads order context and assigns priority, category, and delay risk, i.e., organizes the stream before planning. This is the role of a classifier and routing agent, not a replacement for the route planner.
How does OCR perform with poor-quality scans and phone photos?
#On clean, printed documents, field accuracy reaches 95-99%, but with phone photos, stamps, and handwritten notes, it drops to 80-92%. That’s why every field gets a confidence score, and low-confidence ones go to quick human verification instead of silently entering the system. Financial and weight fields are verified more strictly than descriptive ones because a single wrong digit has real costs.
Can document entry into TMS be fully automated?
#For a narrow band of clean, standardized documents—largely yes. For the entire stream, we don’t recommend full automation of critical fields: data extraction suggests values, and a human approves low-confidence ones with one click. The real goal is reducing team workload by 30-60%, not zero human control.
Where must humans remain in the loop with AI in logistics?
#Everywhere errors are costly or irreversible, and context goes beyond historical data: approving forecasts for high-value items, resolving order priority conflicts, verifying financial and weight fields, and any case with low confidence or strong negative sentiment. This is human oversight built into the process, not added at the end as a formality.
Related articles: AI classification and request routing, AI customer service automation, AI for service companies. Also check the automation finder tool to identify which processes in your logistics are AI-ready first.