source: arxiv machine learning: breaking the solver bottleneck: training task generators at the learnable frontier

level: research

reinforcement learning agents need a steady supply of tasks that are hard enough to push their limits but still solvable. as models improve, fixed task sets become too easy, and naive generation often produces tasks that are trivial, impossible, or ill-formed. directly training a task generator with reinforcement learning to optimize validity and learnability is appealing, but it requires running the solver many times per candidate task. for software engineering tasks, a single solver run can take tens of minutes, making solver-in-the-loop training infeasible.

the propel framework avoids this by training a lightweight activation probe on a one-time labeled corpus of generated tasks and solver outcomes. the probe learns to predict the solver's pass rate from the frozen generator's internal representations. during generator training, the probe provides a fast, amortized signal of task learnability, eliminating the need for repeated solver rollouts. this allows the generator to be optimized for producing tasks at a targeted solve rate without the computational bottleneck.

experiments show that propel can train task generators to produce software engineering tasks at desired difficulty levels, matching the quality of solver-in-the-loop methods while being orders of magnitude faster. the approach opens the door to continuously generating frontier tasks for increasingly capable agents, ensuring that training data remains challenging and relevant as models evolve.

why it matters: it enables scalable creation of training tasks for ai agents, keeping them challenged without the prohibitive cost of constant solver evaluation.


source: arxiv machine learning: breaking the solver bottleneck: training task generators at the learnable frontier