level: research
large language models struggle with causal discovery even after fine-tuning, and their performance drops as causal graphs become more complex. a new paper proves this failure is fundamental, not just a limitation of current models or datasets. the authors show that supervised fine-tuning, direct preference optimization, and in-context learning all produce predictors that cannot tell apart causal graphs that generate similar observational data. any attempt to do so would require the model's internal representations to grow without bound, which violates the conditions needed for these methods to work.
the researchers formalize this as a kernel obstruction theorem. the theorem establishes that the limitation is intrinsic to the learning paradigm itself. because these methods rely on observational data, they hit a hard ceiling when causal structures are not uniquely identifiable from that data. this explains why even fine-tuned models plateau on simple graphs and fail as complexity increases.
to overcome this, the authors propose agentic causal bayesian optimization, or a-cbo. in this framework, a frozen language model acts as an agent that can perform interventions, actively collecting new data to resolve ambiguities. by interacting with the environment, the agent escapes the observational data trap and can discover true causal relationships. experiments show that a-cbo significantly outperforms passive methods, especially on graphs with many variables and complex dependencies.
why it matters: this work clarifies a hard limit for using llms in scientific reasoning and shows that agentic systems with active data collection are necessary for reliable causal discovery.