source: arxiv statistics ml: scalable derivative gaussian processes via exact gradient reduction

level: technical

gaussian processes with gradient observations improve predictions in high dimensions but exact inference scales cubically with the number of function values and gradients. for n points in d dimensions, the cost is o(n^3 d^3), which is impractical for large problems. the new method, tera, avoids this bottleneck by exploiting a property of stationary kernels: gradient components orthogonal to the direction between a target point and conditioning points are conditionally independent of the target value. this means only directional derivatives matter, reducing the effective gradient dimension.

tera uses this exact gradient reduction to build a vecchia approximation where local conditionals depend on at most m^2 directional derivatives for a conditioning set of size m. the result is a scalable derivative gp that maintains accuracy while cutting computational cost. the method is target-specific, meaning the reduction adapts to each prediction point, and it works with any stationary kernel. experiments show tera matches full gp accuracy on high-dimensional test functions while scaling linearly in n and d.

the approach is implemented in a software package and tested on problems with up to 100 dimensions. it outperforms existing scalable derivative gp methods that either ignore gradient structure or use ad-hoc approximations. tera's linear scaling makes it feasible to use gradient-enhanced gps for large-scale surrogate modeling, bayesian optimization, and uncertainty quantification where function evaluations are expensive but gradients are available.

why it matters: tera enables practical use of gradient-enhanced gaussian processes for high-dimensional surrogate modeling and optimization, reducing computational cost from cubic to linear without sacrificing accuracy.


source: arxiv statistics ml: scalable derivative gaussian processes via exact gradient reduction