Law 38 · Architecture & Operations

The Multi-Agent Tax

Every extra agent multiplies your token bill — make sure the task can pay it.

The principle

A multi-agent research system can burn roughly 15× the tokens of a single chat, and token usage alone can explain most of the performance variance. That means multi-agent only makes economic sense when the task's value is high and the work genuinely parallelizes. For most tightly-coupled work, the coordination overhead isn't worth it.

Why it happens

Anthropic reported that their multi-agent research system burned about 15x the tokens of an ordinary chat interaction, and found that token usage alone explained roughly 80% of the performance variance across their evaluations, which makes the cost-versus-value tradeoff explicit. That arithmetic means multi-agent only earns its keep on tasks that are both high-value and genuinely parallelizable, where independent sub-agents can fan out on separable threads without waiting on each other. For tightly-coupled, sequential work, the agents mostly idle on each other's outputs while the coordination overhead and duplicated context inflate the bill for no quality gain. The deeper risk Cognition documents is that splitting work across agents fragments context: actions carry implicit decisions, and sub-agents making conflicting decisions from partial views produce incoherent results, so the tax is paid in both tokens and reliability.

Watch for

The work is sequential or tightly coupled, so sub-agents mostly wait on each other rather than running in parallel.
Token cost has jumped severalfold after splitting into multiple agents with no measurable quality improvement.
Sub-agents make conflicting decisions because each sees only a fragment of the shared context.

In practice

Impressed by a coordinator-and-subagents demo, you refactor your invoice-processing pipeline into five specialist agents that chat to reach consensus. The work is tightly sequential, so they mostly wait on each other while your token bill jumps roughly fifteen-fold for output no better than one well-prompted pass. Multi-agent only earns its keep when the task is high-value and genuinely parallelizes, like fanning out independent research threads. For tightly-coupled work, the coordination overhead is pure tax: keep it a single agent.

Apply it

Reserve multi-agent architectures for high-value tasks that genuinely parallelize into independent threads.
For tightly-coupled work, keep it a single well-prompted agent rather than paying the coordination tax.
If you do split, share full traces and constraints across sub-agents so they do not make conflicting decisions.

The takeaway

Reserve multi-agent architectures for high-value, heavily parallelizable tasks. For everything else the token tax outweighs the gains.

Sources and further reading

Read every law in the digital edition Back to all 50 laws

The principle

Why it happens

Watch for

Apply it

Sources and further reading

Related laws