calibrated uncertainty for treatment effects with small control groups

source: arxiv statistics ml: calibrated inference for the conditional average treatment effect in the few-placebo regime via gaussian processes

level: research

estimating how much a treatment helps a specific person, known as the conditional average treatment effect, is key in medicine, economics, and policy. these estimates are most useful when paired with reliable uncertainty intervals. the few-placebo regime happens when one treatment group is much smaller than the other, common in unequal-allocation trials and small-holdout a/b tests. the standard method here is the x-learner, and a natural step is to make its second stage bayesian to get credible intervals.

the problem is that these bayesian x-learner intervals under-cover: they contain the true effect less often than they should. the root cause is structural. the x-learner's regression target picks up bias from a nuisance model fitted on the small treatment arm. this shifts the posterior away from the true effect. the usual fix of regressing an orthogonal signal does not fully solve it because the nuisance model still leaks bias into the target.

the authors propose using gaussian processes to model the conditional average treatment effect directly, while carefully handling the nuisance functions. this approach yields calibrated uncertainty intervals that achieve the correct coverage. experiments show the method works well in practice, giving reliable error bars even when the placebo group is tiny. the work provides a practical tool for decision-makers who need trustworthy individual-level predictions from imbalanced experiments.

why it matters: reliable uncertainty intervals for individual treatment effects are crucial when making high-stakes decisions from imbalanced data, such as in clinical trials or online experiments.

source: arxiv statistics ml: calibrated inference for the conditional average treatment effect in the few-placebo regime via gaussian processes