J. Mach. Learn. , 4 (2025), pp. 89-107.
Published online: 2025-06
[An open-access article; the PDF is free to any online user.]
Cited by
- BibTex
- RIS
- TXT
We study the convergence of stochastic gradient descent (SGD) for non-convex objective functions. We establish the local convergence with positive probability under the local Łojasiewicz condition introduced by Chatterjee [arXiv:2203.16462, 2022] and an additional local structural assumption of the loss function landscape. A key component of our proof is to ensure that the whole trajectories of SGD stay inside the local region with a positive probability. We also provide examples of neural networks with finite widths such that our assumptions hold.
}, issn = {2790-2048}, doi = {https://doi.org/10.4208/jml.240724}, url = {http://global-sci.org/intro/article_detail/jml/24143.html} }We study the convergence of stochastic gradient descent (SGD) for non-convex objective functions. We establish the local convergence with positive probability under the local Łojasiewicz condition introduced by Chatterjee [arXiv:2203.16462, 2022] and an additional local structural assumption of the loss function landscape. A key component of our proof is to ensure that the whole trajectories of SGD stay inside the local region with a positive probability. We also provide examples of neural networks with finite widths such that our assumptions hold.