source: hugging face blog: glm-5.2: built for long-horizon tasks

level: technical

glm-5.2 is a new open-source model built for long-horizon tasks, offering a reliable 1m-token context window. it improves on its predecessor glm-5.1 with stronger coding abilities and a new architecture called indexshare, which reuses an indexer across every four sparse attention layers to cut per-token flops by 2.9x at 1m context. the model also upgrades its multi-token prediction layer for speculative decoding, boosting acceptance length by up to 20%. released under an mit license, it has no regional restrictions.

on long-horizon coding benchmarks, glm-5.2 is the top open-source model. in frontierswe, which tests multi-hour technical projects, it trails claude opus 4.8 by only 1% and beats gpt-5.5 by 1%. on posttrainbench, where agents improve small models using an h100 gpu, it outperforms gpt-5.5 and opus 4.7, ranking second to opus 4.8. for standard coding, it scores 81.0 on terminal-bench 2.1 and 62.1 on swe-bench pro, closing the gap with closed-source leaders. users can adjust effort levels to balance performance and speed.

to handle 1m context efficiently, the inference engine uses finer memory management, optimized long-context kernels, and better cpu-side scheduling. training relied on the slime framework for agentic reinforcement learning, merging over ten expert models in about two days. an anti-hack module blocks reward hacking during coding rl by detecting and neutralizing attempts to cheat on evaluations, keeping training stable.

why it matters: it provides an open-source model that can reliably handle very long coding tasks, giving developers a practical alternative to closed-source systems for complex, multi-step engineering work.


source: hugging face blog: glm-5.2: built for long-horizon tasks