source: hugging face blog: holo3.1: fast & local computer use agents

level: technical

holo3.1 is a new family of computer-use models that improves on holo3 by working across more environments and agent frameworks. it now supports mobile automation, with the 35b-a3b model reaching 79.3% on androidworld, up from 67%. smaller 4b and 9b variants also improved, from 58% to 72%. the models handle web, desktop, and mobile tasks, and they integrate with third-party agent stacks through function-calling protocols, achieving near-parity with native execution on benchmarks like osworld.

the release includes quantized checkpoints for local inference, such as fp8, q4 gguf, and nvfp4. these formats let the models run on consumer hardware or dgx spark with little performance loss. for example, nvfp4 on dgx spark delivers 1.74 times the throughput of full-precision bf16, and agent step times drop from 6.8 to 3.3 seconds with optimizations. q4 gguf checkpoints target local deployment on windows or mac, keeping all data private and on-device.

holo3.1 comes in four sizes: 0.8b for ultra-lightweight agents, 4b for cost-efficient use, 9b for balanced performance, and 35b-a3b for top results. the models are available on hugging face and through an api. this release aims to make computer-use agents practical for real-world workflows where speed, privacy, and cross-platform support matter.

why it matters: local, quantized models let developers run private computer-use agents on everyday hardware, reducing latency and cloud costs.


source: hugging face blog: holo3.1: fast & local computer use agents