source: arxiv statistics ml: active multiple-prediction-powered inference

level: research

monitoring healthcare ai after deployment needs cheap, statistically sound ways to check performance. getting gold-standard labels from clinician chart review is expensive. prediction-powered inference and active statistical inference lower label costs by mixing a small labeled sample with many model predictions. but both methods only use one predictor. modern clinical pipelines often have several predictors with different costs and accuracy available at inference time.

we introduce active multiple-prediction-powered inference (am-ppi). it routes each instance to a subset of predictors that fits the cost. it samples gold-standard labels based on the residual uncertainty of that subset. it reweights predictions to minimize estimator variance. all this happens under a single budget set at deployment time. am-ppi generalizes active statistical inference to use multiple predictors. it also extends multiple-ppi from a global per-predictor allocation to a more flexible approach.

the method works by choosing which predictors to run for each data point, then deciding how many labels to collect from each chosen subset. it uses importance weighting to correct for the biased sampling. this keeps the estimator unbiased while reducing variance compared to using a single predictor or a fixed allocation. experiments show am-ppi can achieve lower mean squared error than existing methods under the same label budget.

why it matters: it lets healthcare ai monitors use multiple cheap predictors to get reliable performance estimates with fewer expensive clinician labels.


source: arxiv statistics ml: active multiple-prediction-powered inference