uniform weak propagation-of-chaos in shallow neural nets

source: arxiv statistics ml: uniform-in-time weak propagation-of-chaos in shallow neural networks

level: research

researchers studied one-hidden-layer neural networks trained with gradient descent in the feature-learning regime. they compared the output of a finite-width network to its infinite-width counterpart, which follows mean-field dynamics. standard methods give bounds that hold for a fixed time horizon, but these bounds can grow with time. the new work provides bounds that are uniform in time, meaning they do not degrade as training continues.

the key insight is to use the convergence rate of the deterministic wasserstein gradient flow in the mean-field limit. this avoids relying on strong convexity or logarithmic sobolev inequalities, which are common in noisy gradient dynamics. the result is a non-asymptotic weak propagation-of-chaos statement. it shows that the finite-width network output stays close to the mean-field output for all time, with the gap controlled by the mean-field excess loss and the number of neurons.

the bound involves the mean-field excess mse loss at time t and the number of neurons m. as the mean-field loss decays, the finite-width approximation improves uniformly. this provides theoretical support for using mean-field analysis to understand practical finite-width networks over long training periods. the work advances the mathematical understanding of how well large but finite networks approximate their infinite limits.

why it matters: it gives a theoretical guarantee that large neural networks behave like their infinite-width limits throughout training, which helps justify mean-field analysis for practical deep learning.

source: arxiv statistics ml: uniform-in-time weak propagation-of-chaos in shallow neural networks