Law 49 · Trust & Coordination
Don't Let the Author Be the Judge
The thing that made it shouldn't grade it.

The principle
Without an external signal, a model largely fails to self-correct its own reasoning — and often makes correct answers worse by second-guessing them. The model that produced a flawed plan is the same one judging it, with the same blind spots. Real correction needs an outside signal: a tool result, a test that runs, a different model. 'Reflect and try again' on the same model with no new information is theater.
Why it happens
Self-correction fails because the model judging an answer is the same model that produced it, carrying the identical blind spots, so it has no new information to correct against. Huang and colleagues found that without an external signal, models largely cannot self-correct their reasoning and often degrade correct answers by second-guessing them. Stechly, Valmeekam, and Kambhampati pushed further on reasoning and planning tasks, showing that the model's own self-verification is unreliable and that gains attributed to reflection largely vanish or come from an external verifier, not introspection. The shared lesson is that reflect and try again on a fixed model with no fresh input is theater: the second pass samples from the same flawed distribution. Genuine correction requires an outside signal, such as a tool result, a test that actually runs, or a separate model with no memory of the original attempt.
Watch for
- Your correction step is just review your work and fix any bugs with no new input introduced.
- The agent confidently rewrites a correct answer into a wrong one after being asked to reflect.
- A corrected output is trusted without any external check ever having run.
In practice
Your agent writes a SQL query, you prompt it to review your work and fix any bugs, and it cheerfully second-guesses a correct join into a broken one, because it is grading its own reasoning with the exact same blind spots that produced it. Reflection on the same model with no new information is theater: the author cannot see what it could not see the first time. Real correction needs an outside signal. Run the query against a test database, lint it, or hand it to a fresh instance with no memory of the original attempt, and only trust the fixed version once an external check actually passed.
Apply it
- Separate generation from judgment: never let the producing instance be the sole grader.
- Feed an external signal into the correction loop, such as a test that runs, a tool result, or compiler output.
- When using a model to judge, give it a fresh instance with no memory of the original reasoning.
The takeaway
Separate generation from judgment. Use an independent instance — fresh context, no memory of the original reasoning — or an external check like a passing test, before trusting a 'corrected' answer.