source: arxiv statistics ml: support-aware offline policy selection for advertising marketplaces
level: research
advertising platforms often use logged auction data to evaluate new reserve-price policies offline. replay methods can show large apparent revenue lifts, but they may hide problems like weak data support, false positives from testing many policies, harm to specific advertiser groups, and uncertainty about how bidders would react. existing tools estimate or rank policy values but do not directly say whether the evidence is strong enough to move a policy to live validation.
this paper introduces a support-aware offline decision framework. instead of picking a single best policy, it turns logged evidence into a conservative decision object. the output includes certified policies that are statistically supported, alternatives that are clearly dominated, and unresolved candidates that need more testing. the approach accounts for multiple comparisons and weak support in the data, reducing the risk of choosing a policy that looks good in replay but fails in production.
the main theoretical result provides a unified finite-catalog guarantee. under simultaneous inference, the framework controls the error of certifying a policy that is actually worse than the status quo. this gives ad platforms a principled way to screen reserve-price policies offline, flagging only those with reliable evidence for costly online experiments. the method is designed for operational use, where false positives can waste engineering time and harm marketplace health.
why it matters: it helps data scientists avoid picking ad pricing policies that look good in offline logs but fail in live tests, saving time and reducing risk.
source: arxiv statistics ml: support-aware offline policy selection for advertising marketplaces