researchers tested position bias in multiple-choice question answering across several reasoning models, including deepseek-r1 and its distilled variants. they measured how often models favored answers in certain positions, like the first or last option, and compared this to the length of the reasoning chain. across thirteen model configurations on benchmarks like mmlu and arc-challenge, twelve showed a positive correlation between longer reasoning and higher position bias, even after controlling for accuracy. the partial correlations ranged from 0.11 to 0.41, all statistically significant.

the study also found that position bias increased monotonically across length quartiles in all open-weight reasoning models. a causal experiment truncated reasoning chains and resumed them from later points, showing that continuations from later in the trajectory were more likely to exhibit position bias. this suggests that extended reasoning does not eliminate shallow heuristics but may amplify them, as models become more entrenched in initial patterns or overthink simple cues.

for ai and data science, these findings matter because chain-of-thought reasoning is widely used to improve model reliability. if longer reasoning inadvertently increases position bias, it could skew results in applications like automated grading, survey analysis, or decision support systems. developers may need to monitor reasoning length and implement debiasing techniques, such as shuffling answer orders or calibrating confidence, to ensure fairer outcomes.


source: arxiv artificial intelligence: more thinking, more bias: length-driven position bias in reasoning models