arrow
Volume 29, Issue 4
Cholesky-Based Experimental Design for Gaussian Process and Kernel-Based Emulation and Calibration

Helumt Harbrecht, John D. Jakeman & Peter Zaspel

Commun. Comput. Phys., 29 (2021), pp. 1152-1185.

Published online: 2021-02

Export citation
  • Abstract

Gaussian processes and other kernel-based methods are used extensively to construct approximations of multivariate data sets. The accuracy of these approximations is dependent on the data used. This paper presents a computationally efficient algorithm to greedily select training samples that minimize the weighted $L^p$ error of kernel-based approximations for a given number of data. The method successively generates nested samples, with the goal of minimizing the error in high probability regions of densities specified by users. The algorithm presented is extremely simple and can be implemented using existing pivoted Cholesky factorization methods. Training samples are generated in batches which allows training data to be evaluated (labeled) in parallel. For smooth kernels, the algorithm performs comparably with the greedy integrated variance design but has significantly lower complexity. Numerical experiments demonstrate the efficacy of the approach for bounded, unbounded, multi-modal and non-tensor product densities. We also show how to use the proposed algorithm to efficiently generate surrogates for inferring unknown model parameters from data using Bayesian inference.

  • AMS Subject Headings

62F15, 62K20, 65D05

  • Copyright

COPYRIGHT: © Global Science Press

  • Email address
  • BibTex
  • RIS
  • TXT
@Article{CiCP-29-1152, author = {Harbrecht , HelumtD. Jakeman , John and Zaspel , Peter}, title = {Cholesky-Based Experimental Design for Gaussian Process and Kernel-Based Emulation and Calibration}, journal = {Communications in Computational Physics}, year = {2021}, volume = {29}, number = {4}, pages = {1152--1185}, abstract = {

Gaussian processes and other kernel-based methods are used extensively to construct approximations of multivariate data sets. The accuracy of these approximations is dependent on the data used. This paper presents a computationally efficient algorithm to greedily select training samples that minimize the weighted $L^p$ error of kernel-based approximations for a given number of data. The method successively generates nested samples, with the goal of minimizing the error in high probability regions of densities specified by users. The algorithm presented is extremely simple and can be implemented using existing pivoted Cholesky factorization methods. Training samples are generated in batches which allows training data to be evaluated (labeled) in parallel. For smooth kernels, the algorithm performs comparably with the greedy integrated variance design but has significantly lower complexity. Numerical experiments demonstrate the efficacy of the approach for bounded, unbounded, multi-modal and non-tensor product densities. We also show how to use the proposed algorithm to efficiently generate surrogates for inferring unknown model parameters from data using Bayesian inference.

}, issn = {1991-7120}, doi = {https://doi.org/10.4208/cicp.OA-2020-0060}, url = {http://global-sci.org/intro/article_detail/cicp/18644.html} }
TY - JOUR T1 - Cholesky-Based Experimental Design for Gaussian Process and Kernel-Based Emulation and Calibration AU - Harbrecht , Helumt AU - D. Jakeman , John AU - Zaspel , Peter JO - Communications in Computational Physics VL - 4 SP - 1152 EP - 1185 PY - 2021 DA - 2021/02 SN - 29 DO - http://doi.org/10.4208/cicp.OA-2020-0060 UR - https://global-sci.org/intro/article_detail/cicp/18644.html KW - Experimental design, active learning, Gaussian process, radial basis function, uncertainty quantification, Bayesian inference. AB -

Gaussian processes and other kernel-based methods are used extensively to construct approximations of multivariate data sets. The accuracy of these approximations is dependent on the data used. This paper presents a computationally efficient algorithm to greedily select training samples that minimize the weighted $L^p$ error of kernel-based approximations for a given number of data. The method successively generates nested samples, with the goal of minimizing the error in high probability regions of densities specified by users. The algorithm presented is extremely simple and can be implemented using existing pivoted Cholesky factorization methods. Training samples are generated in batches which allows training data to be evaluated (labeled) in parallel. For smooth kernels, the algorithm performs comparably with the greedy integrated variance design but has significantly lower complexity. Numerical experiments demonstrate the efficacy of the approach for bounded, unbounded, multi-modal and non-tensor product densities. We also show how to use the proposed algorithm to efficiently generate surrogates for inferring unknown model parameters from data using Bayesian inference.

Harbrecht , HelumtD. Jakeman , John and Zaspel , Peter. (2021). Cholesky-Based Experimental Design for Gaussian Process and Kernel-Based Emulation and Calibration. Communications in Computational Physics. 29 (4). 1152-1185. doi:10.4208/cicp.OA-2020-0060
Copy to clipboard
The citation has been copied to your clipboard