Learning dynamics of nonconvex models in high dimension

Speaker
Inbar Seroussi (Tel-Aviv University)
Date
26/11/2023 - 13:00 - 12:00Add to Calendar 2023-11-26 12:00:00 2023-11-26 13:00:00 Learning dynamics of nonconvex models in high dimension Stochastic gradient descent (SGD) stands as a cornerstone of optimization and modern machine learning. However, a complete understanding of why SGD performs so well remains a major challenge. In this talk, I will present a mathematical theory for SGD in high dimensions when both the number of samples and problem dimensions are large. We show that the dynamics of SGD, applied to generalized linear models and multi-index problems, with data possessing a general covariance matrix, become deterministic in the large sample and dimensional limit. In particular, the limiting dynamics are governed by a set of low-dimensional ordinary differential equations (ODEs). Our setup encompasses a wide spectrum of optimization problems, including linear regression, logistic regression, and two-layer neural networks. In addition, it unveils the implicit bias inherent in SGD. For each of these problems, the deterministic equivalent of SGD enables us to derive a close approximation of the statistical risk (with explicit and vanishing error bounds). Furthermore, we leverage our theorem to establish explicit conditions on the step size, ensuring the convergence and stability of SGD within high-dimensional settings. This is a joint work with E. Collins-Woodfin, C. Paquette, and E. Paquette, for more details see https://arxiv.org/abs/2308.08977. hybrid mode: math building (216), room 201, and zoom: https://biu-ac-il.zoom.us/j/751076379 אוניברסיטת בר-אילן - המחלקה למתמטיקה mathoffice@math.biu.ac.il Asia/Jerusalem public
Place
hybrid mode: math building (216), room 201, and zoom: https://biu-ac-il.zoom.us/j/751076379
Abstract

Stochastic gradient descent (SGD) stands as a cornerstone of optimization and modern machine learning. However, a complete understanding of why SGD performs so well remains a major challenge.

In this talk, I will present a mathematical theory for SGD in high dimensions when both the number of samples and problem dimensions are large. We show that the dynamics of SGD, applied to generalized linear models and multi-index problems, with data possessing a general covariance matrix, become deterministic in the large sample and dimensional limit. In particular, the limiting dynamics are governed by a set of low-dimensional ordinary differential equations (ODEs).

Our setup encompasses a wide spectrum of optimization problems, including linear regression, logistic regression, and two-layer neural networks. In addition, it unveils the implicit bias inherent in SGD. For each of these problems, the deterministic equivalent of SGD enables us to derive a close approximation of the statistical risk (with explicit and vanishing error bounds). Furthermore, we leverage our theorem to establish explicit conditions on the step size, ensuring the convergence and stability of SGD within high-dimensional settings.

This is a joint work with E. Collins-Woodfin, C. Paquette, and E. Paquette, for more details see https://arxiv.org/abs/2308.08977.

תאריך עדכון אחרון : 21/11/2023