osctom uses rl to test nested belief conflicts in llms

source: arxiv artificial intelligence: osctom: rl-guided adversarial generation for high-order theory of mind

level: research

large language models often struggle with theory of mind tasks that involve layered beliefs and hidden information. existing benchmarks like exploretom do not always capture the recursive, multi-level reasoning needed in real social situations. this paper introduces osctom, which stands for observer-self conflict theory of mind. it focuses on cases where an observer's understanding of another agent's perspective conflicts with the observer's own belief state. such conflicts go beyond simple perspective-taking and require deeper, nested reasoning.

osctom uses reinforcement learning, an extended domain-specific language, and compositional surrogate models to automatically generate these challenging scenarios. the system creates stories where characters hold conflicting nested beliefs, forcing models to track multiple layers of mental states. in tests, osctom-8b outperformed other systems on the fantom benchmark, showing improved handling of recursive theory of mind. the approach builds on prior work but adds a systematic way to produce observer-self conflicts that were previously hard to test at scale.

the results suggest that targeted adversarial generation can expose weaknesses in llm social reasoning. by focusing on specific conflict types, osctom provides a more precise diagnostic tool than broad benchmarks. this work may guide future training methods to improve llm performance in applications like dialogue systems, collaborative ai, and narrative understanding, where tracking complex mental states is essential.

why it matters: better theory of mind testing helps build ai that can handle nuanced social interactions, from customer service bots to multi-agent coordination.

source: arxiv artificial intelligence: osctom: rl-guided adversarial generation for high-order theory of mind