source: techcrunch ai: new microsoft tool lets devs spin up ai behavior tests using text descriptions
level: technical
microsoft released assert, an open source framework that lets developers test ai behavior using natural language descriptions. assert stands for adaptive spec-driven scoring for evaluation and regression testing. it takes high-level descriptions of goals, policies, or intended behaviors and turns them into structured sets of acceptable and unacceptable actions. the tool then generates problem scenarios and test cases, runs them against the target system, and scores the results. developers can also provide system context, tools, and constraints to customize evaluations.
the framework records the ai system's paths, including intermediate actions and tool calls, so developers can inspect where failures happen. for example, a developer could specify that a document research ai agent should not send emails outside the company and should limit confidential information to executives. assert uses those rules to generate test cases that check compliance on an ongoing basis. microsoft says assert fills a gap left by broader evaluations when ai models must behave according to an application's specific context and policies.
sarah bird, chief product officer of responsible ai at microsoft, said evaluations are critical for understanding ai behavior and ensuring systems meet organizational standards. assert can be used during development, after deployment, and for continuous monitoring. the release comes as the ai industry shifts toward repeatable testing and regression checks, with groups like stanford and mlcommons creating benchmarks to measure model behavior under different conditions.
why it matters: it helps developers ensure ai systems follow specific rules for their products, reducing risks from unexpected behavior.
source: techcrunch ai: new microsoft tool lets devs spin up ai behavior tests using text descriptions