target updates may stabilize linear q-learning

source: arxiv statistics ml: target updates may stabilize linear q-learning: periodic and soft dynamics

level: research

periodic target updates in q-learning and soft target updates in actor-critic methods are common stabilization tricks, but their theory is not fully understood. this paper analyzes these mechanisms for linear q-learning using switched linear system dynamics and the joint spectral radius of switching matrix families. linear q-learning can diverge in general, but the authors prove that periodic hard updates and soft updates ensure convergence to the exact projected q-bellman solution under certain conditions.

the analysis focuses on deterministic linear q-learning, where the target-update mechanism is clearest. by modeling the algorithm as a switched linear system, the authors derive spectral and step-size conditions that guarantee stability. the joint spectral radius of the matrices involved must be less than one for convergence. periodic updates create a switching pattern that can reduce the effective spectral radius, while soft updates blend old and new targets to smooth the dynamics.

the results provide a rigorous explanation for why these heuristics work. the conditions are explicit and depend on the discount factor, feature representation, and update frequency. this bridges a gap between practice and theory in reinforcement learning with function approximation. the findings may guide the design of more reliable algorithms and help practitioners choose update schedules and step sizes with greater confidence.

why it matters: it gives clear theoretical conditions for stable linear q-learning, helping ai practitioners design more reliable reinforcement learning algorithms.

source: arxiv statistics ml: target updates may stabilize linear q-learning: periodic and soft dynamics