cart forests as sequential allocation over random opportunity sets

source: arxiv statistics ml: cart random forests as sequential allocation over random opportunity sets: a stochastic-control theory of ensemble risk

level: research

cart random forests are widely used but often treated as black boxes. a new paper reinterprets them through stochastic control. at each tree node, the random subset of features is seen as a random feasible action set. the cart split rule becomes a masked-action allocation policy. this policy drives a controlled stochastic process over informative split-count states. the terminal distribution of this process determines both single-tree error and cross-tree interactions in the forest's mean squared error.

the framework, called cart random opportunity-set allocation (cart-rosa), separates two design levers. the first is the informative-opportunity rate from feature subsampling. the second is the contraction strength from the split rule. by analyzing these levers, the theory opens the black box. it shows how randomness and greedy splits combine to produce ensemble risk. the model links algorithmic choices directly to prediction error.

the theory provides a mechanistic understanding of why random forests work well. it explains the role of feature subsampling in creating diverse trees. the stochastic-control view also suggests ways to tune forests more effectively. by connecting split decisions to a controlled process, the paper offers a new lens for ensemble learning. this could lead to improved forest variants or better theoretical guarantees.

why it matters: it gives a clear, mathematical explanation of random forest mechanics, helping practitioners tune models and researchers design better ensembles.

source: arxiv statistics ml: cart random forests as sequential allocation over random opportunity sets: a stochastic-control theory of ensemble risk