source: kdnuggets: 5 small language models for agentic tool calling
level: technical
agentic ai systems need models that can reliably call tools, pick the right function, format arguments, and integrate results into multi-step workflows. large frontier models handle this well but bring cost, latency, and hardware tradeoffs. small language models now close that gap, offering first-class tool-calling support without needing a data center. here are five such models, all open-weight and compact.
smollm3-3b from hugging face is a 3b parameter model with dual-mode reasoning, six languages, and 64k native context. it supports json/xml and python tool-calling interfaces. qwen3-4b-instruct-2507 from alibaba has 4b parameters, 262k context, and over 100 languages, optimized for fast non-thinking responses with native tool calling via qwen-agent. phi-3-mini-4k-instruct from microsoft is a 3.8b model with strong reasoning, a 4k context window, and an mit license, making it popular for fine-tuning.
gemma-4-e2b-it from google deepmind uses 2.3b effective parameters with per-layer embeddings, enabling under 1.5 gb memory usage and multimodal inputs. it supports native function calling and runs on edge devices. mistral-7b-instruct-v0.3 from mistral ai is the largest at 7.25b parameters, with 32k context and dedicated tokens for tool calls, widely available across inference platforms. these models show that capable agentic tool calling no longer requires massive infrastructure.
why it matters: compact tool-calling models let developers build agentic ai systems that run locally, reducing cost, latency, and privacy risks while maintaining reliable function execution.
source: kdnuggets: 5 small language models for agentic tool calling