new framework for auditing machine unlearning

source: google research: new framework for auditing machine unlearning

level: technical

machine unlearning lets ai models forget specific training data without full retraining, which is needed for privacy laws and safety. verifying unlearning is hard because auditors often only see model outputs, not internals. standard two-sample tests compare output distributions but can miss local anomalies or give false positives when models differ for harmless reasons like different batch sizes. they also require many samples, making them expensive for large models.

the new framework uses f-divergences like chi-squared, kl, and hockey-stick to detect different types of distribution shifts. hockey-stick divergence aligns with differential privacy definitions, allowing auditors to set a threshold for acceptable differences. kernel regularization makes these calculations efficient on high-dimensional data. the method automatically picks the best divergence and settings, removing manual tuning. a relative three-sample test checks if an unlearned model is closer to a safely retrained model or the original compromised one, avoiding false positives from retraining noise.

experiments on synthetic data, physics outlier detection, privacy auditing, and unlearning evaluation showed the framework matches or beats baselines with less tuning. for privacy, the hockey-stick test caught violations with far fewer samples than prior methods. in unlearning tests, standard two-sample tests wrongly flagged safe retrained models, but the relative test correctly identified them as safe. only a random label unlearning method passed, while finetuning, pruning, and selective synaptic dampening failed to truly forget data.

why it matters: this framework gives auditors a practical, sample-efficient way to verify machine unlearning and privacy, reducing false alarms and catching real violations that standard tests miss.

source: google research: new framework for auditing machine unlearning