pathboost: gradient boosting learns graph features from paths

source: arxiv machine learning: path-based gradient boosting for graph-level prediction

level: technical

pathboost is a new method for graph-level classification and regression. it uses gradient tree boosting to learn features from paths in the input graph. the approach builds on earlier work that was limited to a specific chemistry task. pathboost adds three main improvements. first, it handles binary classification by using gradient boosting with a logistic loss. second, it includes multiple node and edge attributes in the path features through a prefix-based decomposition. third, it automatically picks anchor nodes based on categorical attribute diversity, so users do not need to specify starting points for paths.

the method was tested against graph neural networks and graph kernel methods on several benchmark datasets. pathboost achieved better results on half of the datasets and comparable results on the rest. it performed especially well on graphs with larger sizes. the automatic anchor selection and multi-attribute handling make it more flexible than the prior work. the prefix-based decomposition allows the model to use rich attribute information without manual feature engineering.

pathboost offers an alternative to deep learning for graph tasks. it does not require tuning many hyperparameters or large amounts of data. the tree-based approach is interpretable, as the learned paths can be inspected. this makes it useful in domains like drug discovery or social network analysis where understanding the model is important. the code is available for others to use and extend.

why it matters: it gives data scientists a simpler, interpretable alternative to graph neural networks that works well on small to medium graph datasets.

source: arxiv machine learning: path-based gradient boosting for graph-level prediction