source: hugging face blog: five labs, five minds: building a multi-model finance drama on small models

level: technical

the second version of thousand token wood turns a passive simulation into a game where you are a shadow financier. you lend at interest, spread true or false tips, short the market, and broker alliances while a magistrate hunts you for insider trading. each of the five woodland creatures now runs on a different small model from labs like openai, openbmb, nvidia, and a custom fine-tuned qwen 0.5b. this heterogeneity makes the market more realistic because the agents genuinely differ in behavior, like an owl that hoards and a fox that speculates.

serving four different models on one platform revealed that most friction comes from the serving layer, not the models themselves. all models failed initially because the vllm version needed a cuda toolkit not in the base image, fixed by switching to a cuda devel image. each model had minor quirks, like gpt-oss-20b wrapping answers in an analysis preamble or minicpm3 needing trust_remote_code, but these were one-line config fixes. a tolerant json parse-and-repair layer handles output malformations from different tokenizers, so adding a model is just a config entry without refactoring.

the game's core mechanic is insider tips, which must be hidden from the creatures to work. the truth flag of a tip is stored off-prompt on the player's ledger and stripped from public events, with a test scanning every creature's prompt each turn to ensure no leaks. persistent relationships are managed with bounded summaries, not raw history, to avoid overwhelming the small models. a representative run showed zero self-buys, no tip leaks, and mechanics like heat-triggered investigations and banishment working end-to-end.

why it matters: shows how to build complex multi-agent systems with small, heterogeneous models by focusing on serving infrastructure, output parsing, and data firewalls instead of model scale.


source: hugging face blog: five labs, five minds: building a multi-model finance drama on small models