Law 11 · Retrieval & Memory
Retrieval Is the Ceiling
Your answer can only be as good as what you retrieved.

The principle
A model's parametric memory is fixed and imprecise; the retriever supplies the facts it reasons over. If the right passage never makes it into context, no amount of model intelligence recovers it — the generator confidently fills the gap instead. Retrieval quality is the hard ceiling on answer quality, not a tunable nice-to-have.
Why it happens
Retrieval-augmented generation works because the model conditions its output on whatever passages get placed in context, so any fact absent from those passages can only be supplied by the model's frozen parametric memory, which is lossy and approximate. When the gold passage falls outside the top-k, the generator does not abstain; it interpolates from priors and produces a fluent, wrong answer, which is why retrieval recall sets a hard ceiling that no decoder upgrade can lift. This is why retrieval-specific metrics matter as first-class signals: context recall measures whether the evidence needed to answer was actually retrieved, and a low value provably caps end-to-end accuracy regardless of generator quality. The original RAG work framed retrieval and generation as jointly responsible for knowledge-intensive answers precisely because the non-parametric memory is where the answerable facts live.
Watch for
- Upgrading to a stronger generation model barely moves end-to-end accuracy on factual questions.
- You have never measured whether the answer-bearing passage appears in the retrieved set.
- Wrong answers are fluent and confident rather than hedged or empty, suggesting the model is filling a gap.
In practice
You swap one model for a smarter one to fix wrong answers in your support bot, and accuracy barely moves, because the chunk containing the refund policy was never in the top-k to begin with. The model was not dumb, it was guessing into a void and filling it confidently. Before you touch the prompt or the model, log recall@k on a labeled query set: if the right passage is not retrieved 90%+ of the time, no generation upgrade can save you. Fix the retriever first, then optimize generation.
Apply it
- Build a labeled set of queries with known answer passages and measure recall at k before touching prompts or models.
- Treat any answer whose supporting evidence was never retrieved as a retrieval failure, not a generation failure.
- Fix recall first by tuning chunking, query expansion, and k, then optimize the generator only once evidence reliably lands in context.
The takeaway
Measure and optimize retrieval (recall@k, hit rate) as a first-class metric before touching prompts or models. If recall is low, fix retrieval first — better generation cannot save you.