source: arxiv statistics ml: iterative causal discovery: per-edge impossibility certificates, tier-aware oracle queries, and the $1+k$ lower bound

level: research

causal discovery algorithms often return a directed graph without indicating which edge directions are truly identified by the data and which are assigned without a solid identifying assumption. under standard markov and faithfulness conditions, only a markov equivalence class is identified from observational data. orientations within that class are not determined by the joint distribution and cannot be recovered from more samples alone. they require either a functional restriction or an intervention.

this work introduces a protocol for observational causal discovery on continuous data that attaches a discrete impossibility certificate to each candidate edge. a resolved code records the identifiability theorem under which the direction was committed. an impossible code records the failure mode and the specific question a domain expert must answer to resolve it. the bivariate cascade is extended with five gated orientation rules that incorporate tiered background knowledge, allowing experts to provide partial orderings of variables.

the protocol also establishes a lower bound of 1+k orientation queries for any algorithm that orients k edges in the worst case, showing that expert queries are unavoidable. by making the uncertainty explicit, the method provides a principled way to combine data-driven discovery with human knowledge, avoiding false confidence in unverified orientations.

why it matters: it gives data scientists a clear way to know which causal links are reliable and which need expert input, reducing the risk of wrong conclusions in fields like medicine or economics.


source: arxiv statistics ml: iterative causal discovery: per-edge impossibility certificates, tier-aware oracle queries, and the $1+k$ lower bound