source: google research: thinking to recall: how reasoning unlocks parametric knowledge in llms
level: research
large language models often recall facts better when allowed to generate step-by-step reasoning, even for simple single-hop questions that do not require complex logic. researchers at google studied this effect using models like gemini-2.5 and qwen3-32b on closed-book qa datasets. they found that enabling reasoning unlocks correct answers that are otherwise unreachable, as measured by pass@k metrics.
two mechanisms explain the improvement. first, the computational buffer effect: generating extra tokens gives the model more forward passes to refine its internal state, even if the tokens are meaningless. experiments replaced reasoning traces with repeated dummy text and still saw gains. second, factual priming: models generate related facts that act as a semantic warm-up, making the target answer easier to retrieve. for example, listing previous kings helps recall the tenth king of nepal.
however, factual priming introduces a risk: hallucinated intermediate facts reduce the chance of a correct final answer. an audit of reasoning traces showed that even one hallucinated fact significantly lowers accuracy. to improve reliability, researchers used a test-time selection strategy that keeps only trajectories with verifiable, hallucination-free facts, which boosted performance. this suggests training with process rewards that encourage factually supported steps could make models more reliable.
why it matters: understanding these mechanisms can help build language models that recall facts more accurately and reduce hallucinations, improving reliability for knowledge-intensive tasks.
source: google research: thinking to recall: how reasoning unlocks parametric knowledge in llms