Law 07 · Reasoning & Planning

Don't Bet on One Chain

Sample many reasoning paths and let them vote.

The principle

A single greedy chain of thought is fragile, but sampling several independent reasoning paths and taking the majority answer yields large, consistent gains. Correct reasoning tends to converge; mistakes scatter. Agreement across independently-generated plans is a real signal you can trust before acting on something consequential.

Why it happens

A single greedy decode follows one trajectory through a probabilistic space, so a single early misstep is locked in with no recovery, whereas sampling several independent paths exploits a structural asymmetry: correct reasoning tends to converge on the same answer while errors scatter in different directions, making agreement a real signal. Large Language Monkeys (Brown et al., 2024) quantified the upside of drawing many samples: coverage, the fraction of problems solved by at least one sample, scaled log-linearly with the number of attempts across four orders of magnitude, so more independent tries genuinely find more correct answers. The crucial caveat is that this only converts to accuracy when you can pick the right sample, by majority vote when answers are comparable or by an external verifier when they are not. For consequential, hard-to-reverse outputs, sampling several plans and acting on the consensus turns a fragile one-shot guess into a measurable agreement signal.

Watch for

High-stakes outputs ride on a single greedy generation with no second opinion.
Re-running the same prompt yields meaningfully different answers, revealing the first one was luck.
Errors slip through because nothing checks whether independent attempts actually agree.

In practice

Your agent estimates a quote for a custom order in one greedy pass, lands on $1,400, and you send it to the customer, only to discover it dropped a line item that should have made it $2,100. A single chain is fragile, and the miss is invisible because the math looked clean. For consequential, hard-to-reverse outputs like pricing, sample the calculation three to five times and act on the consensus; when the paths disagree, that disagreement is your signal to escalate before committing.

Apply it

For consequential decisions, generate the answer several independent times instead of trusting the first.
Take the majority answer when outputs are comparable, or use an external check to pick among them.
Treat disagreement across the samples as a signal to escalate rather than silently picking one.

The takeaway

For high-stakes decisions, generate the plan or answer several times and act on the consensus — not on the first chain you happened to get.

Sources and further reading

Read every law in the digital edition Back to all 50 laws

The principle

Why it happens

Watch for

Apply it

Sources and further reading

Related laws