Law 22 · Instruction & Output
Show, Don't Tell
When prose fails, stop writing prose.

The principle
If an instruction has produced the wrong result twice, writing it a third time — more precisely — rarely helps, because prose is always interpretable. Two or three concrete input/output examples eliminate the ambiguity that no amount of careful description can. Examples demonstrate the rule; prose only describes it.
Why it happens
Large models perform in-context learning: they infer the intended mapping from a handful of input-output demonstrations rather than from a verbal description, an ability that emerged prominently at the GPT-3 scale where few-shot examples sharply outperformed zero-shot instructions on many tasks. Prose underdetermines the rule because natural language is inherently ambiguous, whereas concrete examples pin the decision boundary, especially for edge cases and the leave it blank cases that words struggle to convey. The lever is real but blunt: example order alone can swing accuracy from near state-of-the-art to near chance, so demonstrations are powerful precisely because the model leans on them heavily. That sensitivity is the flip side of why a third rewrite of the instruction rarely helps while two or three sharp examples usually do.
Watch for
- You have rewritten the same instruction two or three times and the output is still wrong in the same way.
- The model handles the typical case but mangles edge cases the prose tried to describe in the abstract.
- Reviewers keep disagreeing about what the instruction actually means, which means the model cannot resolve it either.
In practice
Your extraction agent keeps formatting phone numbers inconsistently, so you rewrite the instruction a third time: 'normalize to E.164, strip extensions, handle missing area codes gracefully.' It still botches the edge cases. Stop adding adjectives to prose. Drop in four labeled examples instead: '(555) 123-4567' to '+15551234567', 'ext. 12' to dropped, 'unknown' to null, an international number with a country code. The examples pin down exactly what 'gracefully' meant, which no amount of careful description ever could.
Apply it
- Replace failed prose with two or three labeled input-output examples that demonstrate the exact rule.
- Include the hard cases explicitly: edge cases, the empty or null case, and a near-miss that should be rejected.
- Vary or shuffle example order when testing, since order alone can shift results, and keep the examples consistent in format.
The takeaway
When results are inconsistent, switch from describing to demonstrating. Show worked examples — especially the edge cases and the 'leave it blank' cases — and let the model generalize from them.