source: arxiv machine learning: a spectral phase diagram for binary few-shot classification: intrinsic dimensionality, geometric saturation, and representational diagnosis

level: research

a new paper introduces a spectral phase diagram for binary few-shot classification. the work focuses on the problem of knowing when to stop collecting labeled examples. the authors propose a saturation index s(k) that compares the effective rank of the pooled within-class sample covariance to the shot count k. they prove that when s(k) drops below a threshold, the covariance estimator is well-concentrated around the population covariance and the linear discriminant has stabilized.

the index is computed in o(d^3) time using only support features, without needing test labels or a trained classifier. experiments used 246 doubling-pair observations from seventeen binary tasks across six datasets. sixteen of the seventeen tasks showed a positive within-task spearman correlation between s(k) and marginal accuracy gain, with a median correlation of 0.811. the pooled spearman correlation across all observations was 0.548, with a p-value of 1.1e-20.

the method identifies three phases: an initial phase where more labels help, a saturation phase where gains diminish, and a final phase where additional labels may hurt due to overfitting. this provides a practical diagnostic tool for active learning and data collection. by monitoring the saturation index, practitioners can stop labeling when the model's representation is geometrically saturated, saving time and resources.

why it matters: it gives a computable signal to stop labeling data, reducing waste in active learning pipelines.


source: arxiv machine learning: a spectral phase diagram for binary few-shot classification: intrinsic dimensionality, geometric saturation, and representational diagnosis