source: arxiv statistics ml: epistemic uncertainty is not the reducible kind

level: research

the standard view in machine learning splits predictive uncertainty into two types: aleatoric, which is irreducible noise, and epistemic, which can be reduced by collecting more data. epistemic uncertainty is often measured using mutual information between model parameters and predictions. a new paper proves that this definition and this measure do not match. the authors construct a case where the mutual-information measure says all uncertainty is epistemic, but no amount of training data can reduce it.

the problem is that reducibility depends on both the uncertainty and the type of data you can acquire. the paper breaks epistemic uncertainty into two subtypes: sample-reducible and mechanism-reducible. sample-reducible uncertainty goes down when you add more in-distribution data. mechanism-reducible uncertainty only goes down if you change the model or measurement process. an exact formula shows that in-distribution data never reduces mechanism-irreducible uncertainty and usually makes it larger.

the findings have direct consequences for how uncertainty estimates are used in practice. ensemble disagreement, a common proxy for epistemic uncertainty, tracks the training process rather than the true epistemic term. it can collapse to zero even when real uncertainty remains. this means many deployed systems may be overconfident without realizing it. the paper calls for new measures that align with the actual reducibility of uncertainty given the data you can collect.

why it matters: practitioners relying on ensemble disagreement to detect out-of-distribution inputs may be misled, because it can vanish while true epistemic uncertainty persists.


source: arxiv statistics ml: epistemic uncertainty is not the reducible kind