measuring trust between ai agents

source: arxiv artificial intelligence: trust between ai agents: measuring formation, breakage, and recovery, with implications for governing multi-agent systems

level: research

as language-model agents increasingly work in teams, each agent must decide how much to trust its teammates. yet there is no standard way to measure trust between ai agents. researchers propose a behavioral measure based on costly verification. in a cooperative survival game, checking a teammate's work consumes resources, while trusting a wrong answer can be fatal. relative to a memoryless version of the same model, reduced verification provides an observable measure of trust.

using this framework, the study examines trust formation, breakage, and recovery across six frontier model snapshots. when paired with a consistently reliable teammate, four snapshots—claude opus 4.6, claude sonnet 4.6, gpt-5.1, and gemini 3.1 pro—reduce verification by roughly 60 to 85 percent. two smaller snapshots show little or no such adjustment. failures reverse this trust discount, but models differ in how they respond. some concentrate renewed scrutiny on the culprit, while others spread suspicion across the team.

the findings highlight that trust dynamics in ai agents are not uniform. larger models can learn to trust reliable partners, but their recovery strategies after trust is broken vary. this behavioral measure offers a way to quantify trust without relying on self-report or opaque internal states. it could inform the design of multi-agent systems where trust calibration is critical for efficiency and safety.

why it matters: understanding how ai agents form and repair trust can improve coordination in multi-agent systems, reducing wasted resources and preventing cascading failures.

source: arxiv artificial intelligence: trust between ai agents: measuring formation, breakage, and recovery, with implications for governing multi-agent systems