adversarial action removal breaks self-play rl agents

source: arxiv machine learning: when actions disappear: adversarial action removal in self-play reinforcement learning

level: research

researchers tested a new kind of attack on reinforcement learning agents that play against themselves. instead of changing what the agent sees or how it moves, the attacker simply removes some of the agent's legal choices before it acts. this was tried on poker games with up to 5,531 information states and two other domains. the learned masking attack hurt performance much more than random removal or other known attacks.

the attack worked against five different learning algorithms: q-learning, ppo, nfsp, neural nfsp, and dqn. it also transferred between different agents and got worse when the victim kept training against itself. even after long periods of training with the mask still active, the agents did not recover. the attacker learned to focus on high-value decision points, which the authors measured with two new metrics based on reach and value.

the findings show that action availability is a separate weak spot in self-play reinforcement learning. previous work looked at attacks on observations or actions after they were chosen. this study proves that simply taking options off the table before a decision is a powerful and persistent threat. the metrics introduced can help measure how vulnerable a decision point is to this kind of attack.

why it matters: this reveals a new security risk for ai systems that learn by playing against themselves, such as game-playing bots or simulators, where an attacker could cripple learning by quietly blocking certain moves.

source: arxiv machine learning: when actions disappear: adversarial action removal in self-play reinforcement learning