J. Mach. Learn. , 2 (2023), pp. 138-160.
Published online: 2023-06
Category: Theory
[An open-access article; the PDF is free to any online user.]
Cited by
- BibTex
- RIS
- TXT
Loss functions with non-isolated minima have emerged in several machine-learning problems, creating a gap between theoretical predictions and observations in practice. In this paper, we formulate a new type of local convexity condition that is suitable to describe the behavior of loss functions near non-isolated minima. We show that such a condition is general enough to encompass many existing conditions. In addition, we study the local convergence of the stochastic gradient descent (SGD) method under this mild condition by adopting the notion of stochastic stability. In the convergence analysis, we establish concentration inequalities for the iterates in SGD, which can be used to interpret the empirical observation from some practical training results.
}, issn = {2790-2048}, doi = {https://doi.org/10.4208/jml.230106}, url = {http://global-sci.org/intro/article_detail/jml/21759.html} }Loss functions with non-isolated minima have emerged in several machine-learning problems, creating a gap between theoretical predictions and observations in practice. In this paper, we formulate a new type of local convexity condition that is suitable to describe the behavior of loss functions near non-isolated minima. We show that such a condition is general enough to encompass many existing conditions. In addition, we study the local convergence of the stochastic gradient descent (SGD) method under this mild condition by adopting the notion of stochastic stability. In the convergence analysis, we establish concentration inequalities for the iterates in SGD, which can be used to interpret the empirical observation from some practical training results.