regret metric exposes cost of strict false discovery control

source: arxiv statistics ml: a regret perspective on online multiple testing

level: research

online multiple testing traditionally looks at false discovery rate and power separately, ignoring the real-world imbalance between false positive and false negative costs. a new metric called weighted regret captures this asymmetry by penalizing both types of errors according to their practical impact. under this metric, a duality of regret conservation emerges: any purely deterministic method that strictly controls the false discovery rate must incur at least linear regret over time. the problem is worst during cold starts when signals are sparse, causing threshold depletion that forces many false negatives.

to address this, the authors propose decoupled-omt, a meta-wrapper that works with any baseline procedure. it adds a history-decoupled, strictly non-negative random perturbation to the test statistics. this simple change prevents the threshold from collapsing during signal-sparse periods, rescuing deterministic baselines from severe false negative accumulation. the method is designed for exogenous testing streams, where the order of hypotheses is independent of the data.

the key result is that decoupled-omt preserves exact asymptotic safety—meaning it maintains valid error control—while dramatically reducing regret compared to standard deterministic approaches. experiments show it closes the gap between strict fdr control and practical decision-making needs. the framework provides a unified way to evaluate and improve sequential testing in automated pipelines like a/b testing platforms, anomaly detection, and adaptive clinical trials.

why it matters: it gives data scientists a practical tool to balance false positives and false negatives in real-time decision systems, avoiding the hidden cost of overly conservative testing.

source: arxiv statistics ml: a regret perspective on online multiple testing