Law 09 · Reasoning & Planning

Stop Tuning, Start Scaling

General methods plus compute beat your clever scaffolding.

Diagram explaining Stop Tuning, Start Scaling

The principle

The Bitter Lesson distills 70 years of AI: approaches that leverage general computation eventually crush approaches built on hand-encoded human cleverness, by a large margin. Baked-in scaffolds — elaborate prompt chains, rigid decision trees, hardcoded heuristics — buy a short-term gain and become a ceiling. Your intricate planning DSL will likely be obsoleted by the next, more capable model.

Why it happens

The Bitter Lesson holds because hand-encoded scaffolding bakes in assumptions about how the model reasons, and those assumptions become a hard ceiling the moment a more capable model could have reasoned past them on its own. Modern work shows the leverage has shifted to inference-time compute that any model can use generically: Snell et al. (2024) found that optimally allocating test-time compute, like sampling and verifying multiple attempts, can outperform a roughly 14x larger model on hard problems, meaning general methods plus compute beat brittle bespoke logic. An elaborate prompt-chain DSL or a 40-node decision tree buys a short-term win on today's weaker model and then actively gets in the way of the next one, which a plain here are the tools, decide prompt would have handled. The discipline is to build the thinnest scaffold that works and that you would be happy to delete on the next model release.

Watch for

In practice

You spend two weeks hand-building a 40-node decision tree and a brittle prompt-chain DSL to make a weaker model route tickets correctly, and it works, until the next model release makes your scaffolding the bottleneck and a plain 'here are the tools, decide' prompt beats it. Hand-encoded cleverness buys a short-term win and becomes a permanent ceiling. Build the thinnest scaffold that works and that you would happily delete when the model improves, because it will.

Apply it

  1. Prefer general, model-driven reasoning over bespoke decision trees and hardcoded heuristics.
  2. Build the thinnest scaffold that works and that you would happily delete when the model improves.
  3. Periodically re-test a minimal-scaffold baseline against your tuned pipeline as models advance.

The takeaway

Prefer general, model-driven reasoning over bespoke hand-tuned logic. Build scaffolding you'd be happy to delete when the model improves.

Sources and further reading

Related laws

Read every law in the digital edition Back to all 50 laws