learning when to act for safe, communication-efficient rl

source: arxiv machine learning: learning when to act: communication-efficient reinforcement learning via run-time assurance

level: research

safe reinforcement learning often focuses on what action to take. this work asks when an agent needs to act, aiming to reduce communication while keeping the system safe. the approach uses a single policy that jointly learns control inputs and timing decisions. a run-time assurance layer acts as a safety shield, overriding the policy if a one-step-ahead lyapunov prediction indicates risk. this provides a pointwise safety guarantee, stronger than methods that only ensure safety on average.

the method is tested on stabilization tasks around a known equilibrium, where analytical baselines like lyapunov-triggered control and lqr are well-defined. the policy learns to extend the time between control updates, measured by mean inter-sample interval. on an inverted pendulum, cart-pole, and planar quadrotor, it achieves 1.91x, 1.45x, and 3.51x longer intervals compared to a lyapunov-triggered baseline, while maintaining stability through the backup controller.

the safety shield uses a precomputed lqr backup and lyapunov certificates, ensuring that the system never leaves a safe region. this is a stricter guarantee than constrained markov decision process methods, which only bound expected safety. the learned policy decides when to sample and act, reducing communication overhead in networked control systems. the results show that jointly optimizing control and communication timing can significantly lower resource usage without sacrificing safety.

why it matters: this can reduce communication and computation in real-world control systems like drones or robots, making safe reinforcement learning more practical for resource-limited settings.

source: arxiv machine learning: learning when to act: communication-efficient reinforcement learning via run-time assurance