source: arxiv machine learning: staged factorial screening for budget-constrained micro-pretraining

level: research

micro-pretraining with limited compute often means testing many recipe ideas quickly before committing more resources. this work examines a staged fractional-factorial workflow to see if it can reliably find which design choices matter early on. the setup uses a fixed single-gpu training loop from an autoresearch system. the team ran 613 experiments across pilot and follow-up screens at 2, 5, and 10 minutes, plus full 16-condition reruns, anchor checks, and baselines. they also tested longer runs up to 24 hours on different hardware.

the main findings show that penalties from total batch size, depth, and width are largest at short budgets and become less severe as training time increases. within the predeclared seeded full-screen families, factors d, a, b, and c kept non-zero effect estimates at 5 and 10 minutes after applying a within-budget benjamini-hochberg correction. factor e did not pass this threshold. greedy and matched-cost random baselines were also compared, along with a 60-minute bridge package and anchor continuations on windows a100 and linux l40s gpus.

the results suggest that staged fractional-factorial designs can surface stable early signals in micro-pretraining, helping practitioners triage recipes without wasting compute. the approach may guide resource allocation when many hyperparameter combinations compete for limited accelerator time. the study provides a practical template for running efficient screening experiments in similar budget-constrained settings.

why it matters: it offers a method to quickly identify important training factors under tight compute budgets, saving time and resources in ai model development.


source: arxiv machine learning: staged factorial screening for budget-constrained micro-pretraining