level: technical
unsupervised anomaly detection is important for transaction fraud detection when labels are scarce. isolation forest is a popular method because it scales well and is easy to deploy. a new method called silif augments isolation forest with a silhouette-based scoring layer. it uses the tree structure of the forest to create a representation space. for each transaction, it extracts a vector of path lengths from the trees. these vectors act as fingerprints that capture the structure of the data.
silif clusters these fingerprints into groups and computes a silhouette score for each point. the silhouette score measures how well a point fits its assigned group compared to the nearest other group. this score is combined with the base isolation forest score using a single hyperparameter alpha. the method was tested on the ieee-cis fraud detection benchmark, which has about 590,000 transactions and a 3.5% fraud rate.
with alpha set to 1.0, silif improved the average precision-recall auc by 0.0080 over plain isolation forest across five random seeds. silif outperformed the baseline on all five seeds, and a paired t-test confirmed the improvement was statistically significant. the approach adds minimal complexity and can be easily integrated into existing isolation forest pipelines.
why it matters: improving unsupervised fraud detection helps catch more fraud without needing labeled data, which is often unavailable or expensive to obtain.