source: arxiv machine learning: systematic exploration of 4-expert heterogeneous mixture-of-experts via automated pipeline search

level: technical

researchers built an automated pipeline to search for heterogeneous 4-expert mixture-of-experts (moe4) architectures using the lemur neural network dataset. the system replaced manual design with a code generator that combined base architecture families into moe4 ensembles. each model used a convolutional gating network with temperature scaling, mixup augmentation, and cosine-annealed learning rate scheduling. the pipeline ran for 28 days on an nvidia rtx 4090, generating 4,463 candidate models across 197 batches, with 1,021 evaluated successfully.

a key finding was that the search space was unintentionally constrained. the pipeline used itertools.combinations for family selection, which enumerates alphabetically. this caused the entire explored space—only 4.8% of the theoretical 23,751 possible 4-family combinations—to be anchored to a single family, airnet. as a result, the search did not broadly sample the available design space, potentially missing better architectures.

the work highlights how implementation details like enumeration order can introduce hidden biases in automated architecture search. the lemur ecosystem and the generated models provide a foundation for studying heterogeneous mixture-of-experts, but the airnet anchoring means the results may not generalize. future searches should use randomized or stratified sampling to avoid such biases and explore more of the design space.

why it matters: it shows how a simple coding choice can skew automated model search, reminding data scientists to check for unintended biases in experiment design.


source: arxiv machine learning: systematic exploration of 4-expert heterogeneous mixture-of-experts via automated pipeline search