microsoft releases mai models with low active parameters

source: simon willison: microsoft's new mai models

level: technical

microsoft introduced two new text models: mai-thinking-1, a reasoning model with 1 trillion total parameters and 35 billion active, and mai-code-1-flash, a code-focused model with 137 billion total and 5 billion active. mai-code-1-flash is built for github copilot and vs code, rolling out to individual users. mai-thinking-1 is available to select early partners. microsoft claims mai-thinking-1 is preferred over anthropic's sonnet 4.6 in blind evaluations, which is notable for a model with only 35 billion active parameters.

the company emphasized that both models were trained from scratch on enterprise-grade, clean, and commercially licensed data, without distillation from other models. this sparked interest in whether the code model avoided training on unlicensed web data. however, a technical paper for mai-thinking-1 revealed that its training data includes a proprietary web crawl of 794 billion pages after filtering, plus 24.2 billion pages from common crawl. the crawl was filtered for adult content, piracy, and ai-generated content, but it still relies on publicly available web data.

the initial report misstated the model sizes, confusing active parameters with total parameters. the correction came after reviewing the model card and technical paper. the training data details show that despite claims of clean data, the models use web crawls similar to other large language models, raising questions about licensing. the low active parameter counts make these models cheaper to run, which could benefit developers using them in tools like copilot.

why it matters: low active parameter models can reduce inference costs for ai tools, but the training data licensing remains a concern for commercial use.

source: simon willison: microsoft's new mai models