skim speeds up web agents with speculative execution

source: arxiv artificial intelligence: skim: speculative execution for fast and efficient web agents

level: technical

skim is a speculative execution framework for web agents that takes advantage of the predictable structure of purpose-built websites. current web agents run expensive frontier-model inference, browser rendering, and react-style planning on every step, even for simple tasks. skim observes that many websites have stable url patterns, answer formats, and task-to-trajectory mappings across similar queries. this means most queries can avoid these heavyweight components entirely.

the system works in two phases. an offline profiler captures these patterns once per site. at runtime, skim matches each query to a template, synthesizes the destination url, and extracts the answer using a small model. a lightweight verifier checks each fast-path output against the query and schema. if the output fails verification, the system cascades to the full agent, which is warm-started from the fast path's final url to save time.

this approach reduces the cost and latency of web agents by handling routine tasks with minimal resources. the fast path uses a small model instead of a frontier model, and skips browser rendering and multi-step planning. the verifier ensures reliability, so rare mismatches are handled gracefully. the method is designed for purpose-built websites where structure is consistent, making it practical for many real-world web automation tasks.

why it matters: it cuts the cost and latency of web agents by using cheap, template-based execution for routine tasks, making ai-driven web automation more practical.

source: arxiv artificial intelligence: skim: speculative execution for fast and efficient web agents