alpha-tcav fixes statistical flaws in concept-based explainability

source: arxiv statistics ml: $\alpha$-tcav: a unified framework for testing with concept activation vectors

level: research

concept activation vectors (cavs) help explain deep learning models by linking internal representations to human-understandable concepts. however, the standard testing with cavs (tcav) method suffers from statistical instability. researchers analyzed the stochastic nature of cavs and derived distributions for common variants like patterncav, fastcav, and ridge regression-based cavs. they found a key flaw: the tcav score uses a discontinuous indicator function that leads to non-decaying variance in critical situations, making results unreliable.

to fix this, the team introduced alpha-tcav, a generalized framework that swaps the indicator for a parameterized smooth function. this change creates a unified probabilistic formulation that includes both tcav and multi-tcav as special cases. by characterizing the resulting sensitivity score distributions, they showed that popular state-of-the-art choices lack theoretical backing. alpha-tcav provides a principled way to reduce variance and improve the trustworthiness of concept-based explanations.

the work offers a rigorous statistical foundation for concept-based explainability. it moves beyond ad-hoc methods by providing clear distributional properties and a flexible parameter that controls the smoothness of the decision boundary. this allows practitioners to tune the trade-off between sensitivity and stability. the framework can be applied to any cav variant, making it a drop-in improvement for existing explainability pipelines in computer vision and other domains.

why it matters: reliable concept-based explanations are crucial for debugging and trusting ai models in high-stakes applications like medical imaging or autonomous driving.

source: arxiv statistics ml: $\alpha$-tcav: a unified framework for testing with concept activation vectors