source: arxiv statistics ml: from persistence to survival: hypothesis testing, effect sizes and vectorisation for topological features

level: research

persistence diagrams are common in topological data analysis but lack a natural vector space structure. statistical tools for comparing them have developed separately from those used in prediction tasks. the new method, called strand, treats collections of persistence diagrams as survival data. each topological feature with persistence p equals d minus b is seen as a fully observed time-to-event. the persistence survival function s of t equals the probability that p is greater than t becomes the central object for comparison.

from this single representation, strand derives three tools. first, a non-parametric two-sample test with calibrated type i error and high power, even from a small number of diagrams. second, interpretable effect sizes that quantify differences between groups. third, a 1-wasserstein-stable feature vector that can be used directly in downstream machine learning models. the approach unifies statistical testing and feature extraction under one framework.

the method was validated on synthetic manifolds with controlled topology, showing good calibration and power. by framing persistence diagrams as survival data, strand bridges the gap between topological data analysis and standard statistical and machine learning workflows. this makes it easier to apply topological features in practical data science problems where hypothesis testing and prediction are both needed.

why it matters: it gives data scientists a unified way to test for topological differences and create stable features for machine learning from persistence diagrams.


source: arxiv statistics ml: from persistence to survival: hypothesis testing, effect sizes and vectorisation for topological features