source: arxiv artificial intelligence: uncertainty-aware and temporally regulated expert advice in reinforcement learning for autonomous driving

level: technical

reinforcement learning for autonomous driving is risky because agents must try new actions to learn, but exploration can cause crashes or off-road driving. a new framework uses expert advice to guide exploration without creating long-term dependence. advice is given only when the agent's uncertainty—either from lack of knowledge or from noisy situations—goes above adaptive thresholds. these thresholds are calculated from rolling buffers of recent uncertainty values, so they adjust as the agent learns.

the system controls how often and how long advice is used through a commitment-cooldown strategy with a stochastic early-stop rule. this means the agent gets coherent expert maneuvers for a limited time, then must act on its own, preventing overreliance. expert and agent experiences are stored together in a shared replay buffer, using an off-policy implicit quantile network. this setup allows efficient reuse of expert data, helping the agent learn from past guidance.

tests in the carla driving simulator showed the method outperformed a standard implicit quantile network baseline. the agent learned to drive more safely, with fewer collisions and off-road incidents, while still exploring effectively. the adaptive advice mechanism balanced safety and learning, reducing the need for constant expert intervention. this approach could make training autonomous vehicles faster and safer in simulation before real-world deployment.

why it matters: it helps train self-driving ai with less risk by using expert help only when needed, making simulation-based learning more practical.


source: arxiv artificial intelligence: uncertainty-aware and temporally regulated expert advice in reinforcement learning for autonomous driving