source: arxiv statistics ml: recursively trained diffusion models: limiting collapse distribution and spectral characterization

level: research

recursive training of generative models on their own outputs can cause model collapse, where generated samples drift away from the real data. this paper studies what happens when diffusion models are repeatedly trained on their own outputs. the authors find that even with perfect score estimation and exact sampling, the recursion still converges to a unique limiting distribution. the drift comes from early stopping of the reverse diffusion process, which is needed for numerical stability.

the limiting distribution has a closed form: it is an infinite mixture of the true data distribution smoothed by gaussian noise of increasing variance. the recursion converges geometrically to this limit. a spectral analysis using hermite polynomials shows that recursive training acts as a low-pass filter, suppressing higher-order details of the data distribution. this means the model gradually loses fine-grained features and retains only coarse, smoothed patterns.

the results provide a precise characterization of the collapse distribution and convergence rate. this helps explain why generated data degrades over generations and gives a theoretical baseline for the best possible outcome under ideal conditions. the findings are relevant for understanding the long-term behavior of generative models and for designing strategies to mitigate collapse in iterative training pipelines.

why it matters: it quantifies the unavoidable drift in recursively trained diffusion models, helping practitioners set expectations and design better training loops for generative ai.


source: arxiv statistics ml: recursively trained diffusion models: limiting collapse distribution and spectral characterization