Volume 2, Issue 1
Batch Normalization Preconditioning for Stochastic Gradient Langevin Dynamics

Susanna Lange, Wei Deng, Qiang Ye & Guang Lin

J. Mach. Learn. , 2 (2023), pp. 65-82.

Published online: 2023-03

Category: Algorithm

[An open-access article; the PDF is free to any online user.]

Export citation
  • Abstract

Stochastic gradient Langevin dynamics (SGLD) is a standard sampling technique for uncertainty estimation in Bayesian neural networks. Past methods have shown improved convergence by including a preconditioning of SGLD based on RMSprop. This preconditioning serves to adapt to the local geometry of the parameter space and improve the performance of deep neural networks. In this paper, we develop another preconditioning technique to accelerate training and improve convergence by incorporating a recently developed batch normalization preconditioning (BNP), into our methods. BNP uses mini-batch statistics to improve the conditioning of the Hessian of the loss function in traditional neural networks and thus improve convergence. We will show that applying BNP to SGLD will improve the conditioning of the Fisher information matrix, which improves the convergence. We present the results of this method on three experiments including a simulation example, a contextual bandit example, and a residual network which show the improved initial convergence provided by BNP, in addition to an improved condition number from this method.

  • General Summary

Stochastic gradient Langevin dynamics (SGLD) is a standard sampling technique for uncertainty estimation in Bayesian neural networks. Past methods have shown improved convergence by including preconditioning of SGLD based on RMSprop. This preconditioning serves to adapt to the local geometry of the parameter space and improve the performance of deep neural networks. In this paper, we develop another preconditioning technique to accelerate training and improve convergence by incorporating a recently developed Batch Normalization Preconditioning (BNP), into our methods. BNP uses mini-batch statistics to improve the conditioning of the Hessian of the loss function in traditional neural networks and thus improve convergence.

  • AMS Subject Headings

  • Copyright

COPYRIGHT: © Global Science Press

  • Email address
  • BibTex
  • RIS
  • TXT
@Article{JML-2-65, author = {Lange , SusannaDeng , WeiYe , Qiang and Lin , Guang}, title = {Batch Normalization Preconditioning for Stochastic Gradient Langevin Dynamics}, journal = {Journal of Machine Learning}, year = {2023}, volume = {2}, number = {1}, pages = {65--82}, abstract = {

Stochastic gradient Langevin dynamics (SGLD) is a standard sampling technique for uncertainty estimation in Bayesian neural networks. Past methods have shown improved convergence by including a preconditioning of SGLD based on RMSprop. This preconditioning serves to adapt to the local geometry of the parameter space and improve the performance of deep neural networks. In this paper, we develop another preconditioning technique to accelerate training and improve convergence by incorporating a recently developed batch normalization preconditioning (BNP), into our methods. BNP uses mini-batch statistics to improve the conditioning of the Hessian of the loss function in traditional neural networks and thus improve convergence. We will show that applying BNP to SGLD will improve the conditioning of the Fisher information matrix, which improves the convergence. We present the results of this method on three experiments including a simulation example, a contextual bandit example, and a residual network which show the improved initial convergence provided by BNP, in addition to an improved condition number from this method.

}, issn = {2790-2048}, doi = {https://doi.org/10.4208/jml.220726a}, url = {http://global-sci.org/intro/article_detail/jml/21513.html} }
TY - JOUR T1 - Batch Normalization Preconditioning for Stochastic Gradient Langevin Dynamics AU - Lange , Susanna AU - Deng , Wei AU - Ye , Qiang AU - Lin , Guang JO - Journal of Machine Learning VL - 1 SP - 65 EP - 82 PY - 2023 DA - 2023/03 SN - 2 DO - http://doi.org/10.4208/jml.220726a UR - https://global-sci.org/intro/article_detail/jml/21513.html KW - Bayesian neural networks, Preconditioning, Batch normalization, Stochastic gradient Langevin dynamics. AB -

Stochastic gradient Langevin dynamics (SGLD) is a standard sampling technique for uncertainty estimation in Bayesian neural networks. Past methods have shown improved convergence by including a preconditioning of SGLD based on RMSprop. This preconditioning serves to adapt to the local geometry of the parameter space and improve the performance of deep neural networks. In this paper, we develop another preconditioning technique to accelerate training and improve convergence by incorporating a recently developed batch normalization preconditioning (BNP), into our methods. BNP uses mini-batch statistics to improve the conditioning of the Hessian of the loss function in traditional neural networks and thus improve convergence. We will show that applying BNP to SGLD will improve the conditioning of the Fisher information matrix, which improves the convergence. We present the results of this method on three experiments including a simulation example, a contextual bandit example, and a residual network which show the improved initial convergence provided by BNP, in addition to an improved condition number from this method.

Lange , SusannaDeng , WeiYe , Qiang and Lin , Guang. (2023). Batch Normalization Preconditioning for Stochastic Gradient Langevin Dynamics. Journal of Machine Learning. 2 (1). 65-82. doi:10.4208/jml.220726a
Copy to clipboard
The citation has been copied to your clipboard