source: arxiv artificial intelligence: why limit the residual stream to layers and not tokens? persistent memory for continuous latent reasoning

level: research

large language models can reason in latent space using chain of continuous thought, exploring multiple paths at once. but a problem called the concept bottleneck appears. at each reasoning step, hidden states get overwritten, so the model loses important facts from earlier steps as reasoning depth grows. on hotpotqa, vanilla continuous thought scored 10.4% exact match, no better than chain of thought at 11.0%. on gsm8k, performance dropped with more curriculum depth.

the proposed fix is adaptive gated continuous latent reasoning, or agclr. it adds a gated concept stream, a persistent residual memory that runs alongside the normal hidden states. this stream uses learned gates to decide what information to keep or update, preventing earlier computed facts from being erased. the model can then carry forward key concepts across many reasoning passes without forgetting.

experiments show agclr improves multi-step reasoning. on hotpotqa, it reaches 14.2% exact match, beating both chain of thought and vanilla continuous thought. on gsm8k, it maintains accuracy even with deeper reasoning curricula. the gated stream adds little overhead and works with existing continuous thought setups. the approach shows that persistent memory can fix the concept bottleneck and make latent reasoning more reliable for complex tasks.

why it matters: this method helps ai models keep important information during long reasoning chains, making them more accurate on complex questions without extra training data.


source: arxiv artificial intelligence: why limit the residual stream to layers and not tokens? persistent memory for continuous latent reasoning