A manufacturing company built a single agent that was supposed to do everything: answer customer questions, generate quotes, schedule sales meetings, and create weekly sales reports. After three months, the agent worked—but increasingly worse. Quotes were less precise because the context window was occupied by customer queries. Reports were delayed because coordination with the CRM clashed with the service queue. The cost per query tripled compared to the pilot. The solution wasn’t a better model but breaking it down into four specialized units with an agent-router at the helm.
This is a typical scenario of overloading a single agent. However, not every case requires a multi-agent architecture.
What is multi-agent orchestration
#Orchestration is a mechanism that assigns tasks to agents, collects their results, and merges them into a coherent output. We distinguish three patterns:
Router (classifier + dispatch). A coordinator agent analyzes the input and directs it to the appropriate specialist agent. A customer asks about an invoice? It goes to the finance agent. Asks about delivery status? It goes to the logistics agent. The router doesn’t merge results—it just redirects. Simple, cheap, easy to debug.
Pipeline (sequence). The task passes through successive agents like an assembly line: agent A collects data, agent B analyzes it, agent C formats the report. Each agent receives the previous agent’s output as input. Works well for document processing, generating offers, multi-step research.
Mesh (network). Agents can call each other in any order, depending on the task’s needs. The most flexible pattern but also the hardest to monitor. The risk of loops is real if there are no hard call limits and a human-handoff mechanism.
Most business implementations start with a router or pipeline. The mesh is reserved for when the two simpler patterns clearly aren’t enough.
When multi-agent outperforms a single agent
#There’s no single threshold here. There are signals that indicate it’s time to break things down:
Context is too small for all roles. When an agent must simultaneously hold pricing policy, customer history, complaint rules, and report format in its context, the context window becomes a bottleneck. Specialization means a smaller system prompt and more accurate responses.
Tasks can run in parallel. A sequential pipeline for ten independent customers takes ten times longer than one task. Parallel calls to specialized agents reduce latency by 60-70% if tasks have no dependencies.
Different tasks require different models. Intent classification is a job for a small, cheap model (response in 200 ms). Synthesizing a multi-page offer requires a larger model with a longer context. An LLM router between agents selects the model per task, reducing inference costs by 30-50% while maintaining quality.
Responsibility boundaries have legal significance. When part of the process involves personal data (RODO, DPIA) and part doesn’t, isolating agents with separate guardrails and logs simplifies compliance documentation required by the AI Act.
Decision table: single agent vs. multi-agent system
#| Scenario | Single Agent | Multi-Agent System |
|---|---|---|
| One coherent process (e.g., complaint handling) | Yes — lower cost and simpler debug | Overengineering |
| Several unrelated domains in one bot | Risk of quality degradation after 3-4 domains | Yes — router per domain |
| Tasks that can be parallelized | Sequence — slower | Yes — parallel dispatch |
| Different models for different steps | Hard to manage | Yes — per-agent model selection |
| Short pilot / PoC | Yes — faster to deploy and test | Too much operational risk at the start |
| Different legal requirements per area | Shared guardrails may be too broad | Yes — log and guardrail isolation |
| Small operational budget | Yes — lower TCO | Each additional agent = higher monitoring costs |
How to connect agents: practical architecture
#The coordinator (orchestrator) is the heart of the system. Its sole responsibility is to: accept a task, decide who handles it, collect the result, return the answer, or escalate. The coordinator shouldn’t perform domain tasks itself—otherwise, it becomes the same overloaded agent we were trying to escape.
Practical connection principles:
Contracts between agents. Each specialist agent has a documented input and output schema (structured output). The coordinator doesn’t assume it knows the format—it reads the schema and validates the result before passing it on. Lack of validation is a source of silent errors in the pipeline.
Call limits and timeouts per agent. Each call has a hard timeout (e.g., 30 seconds) and a retry limit (e.g., 2 retries with exponential backoff). After exceeding the limit, the agent returns an error result with context, and the coordinator decides: escalate to a human or fall back to a simpler path.
Shared log registry. Each agent writes to a single logging system with a session ID and call ID. Without this, debugging an event that passed through three agents turns into a multi-hour investigation. This is an observability requirement and the basis for an audit trail compliant with the AI Act.
Human-gate for irreversible actions. Regardless of which agent performs the action (send email, issue invoice, change ERP status), if the action is irreversible, it stops and waits for operator confirmation. This applies to every agent in the network, not just the coordinator.
Learn more about tool architecture and loops in the article on multi-step agents and the description of agents that perform work.
Risks and when multi-agent is overengineering
#Multi-agent architecture has real costs. It’s worth knowing them before implementation:
Loops and deadlocks. Agent A calls agent B, which calls agent A with a different context. Without a hard dependency graph and depth counter, the system can loop in a way that’s hard to detect. Symptom: a sharp increase in inference costs without a proportional increase in value.
Higher monitoring costs. Each additional agent is a separate point for metrics, alerts, and logs. Monitoring a five-agent system costs proportionally more than monitoring one. Before implementation, assess TCO using the inference calculator and ROI calculator.
Harder debugging. An error at the end of the pipeline must be traced back through each agent. Without a good trace system (session ID propagated through the entire pipeline), diagnosis takes much longer than in a single-agent system.
Network latencies add up. Each call between agents (especially in cloud architecture) adds latency. In a five-agent sequential pipeline, the total response time may exceed the user’s acceptable threshold, even if each agent is fast individually.
Multi-agent is an investment, not a shortcut. If the problem can be solved with a better prompt, an expanded RAG database, or breaking it into two independent processes instead of a connected network, do that first. Also check the costs of maintaining an AI agent and the article on AI agent security, because each additional agent expands the attack surface.
Try it live: consider breaking down a single agent
#FAQ
#How many agents is a reasonable number to start with?
#For most companies, 2-4 specialist agents plus one coordinator is a good starting point. This is enough to handle several domain-specific processes while keeping operational complexity manageable. Systems with more than 8-10 agents make sense only in very large organizations with extensive AI DevOps support.
Does every agent have to use the same model?
#No. This is one of the main advantages of a multi-agent architecture. An intent classifier can run on a small, fast model (7B, local), while a report synthesis agent can use a larger cloud model. Choosing the model per task reduces inference costs by 30-50% without losing quality on critical steps.
How to debug an error that passed through several agents?
#The key is propagating a single session ID through the entire pipeline. Each agent logs input, output, and processing time with the same ID. During an incident, you filter logs by ID and reconstruct the call sequence. Without this, even a simple pipeline becomes a black box.
Does a multi-agent system require a separate AI Act compliance assessment?
#It depends on the risk classification. If any agent in the network makes decisions classified as high-risk (e.g., credit scoring, job candidate selection), the entire system is subject to AI Act requirements as a high-risk system. Isolating agents doesn’t exempt you from documenting and auditing the entire process.
When is it worth starting with a single agent, even if you plan a multi-agent system eventually?
#Always. A single agent means faster piloting, simpler debugging, and lower deployment costs. Division patterns and contracts between agents are best designed based on real data from a working system, not theory. After 4-8 weeks of piloting, you’ll see where a single agent actually fails and how to divide responsibilities.