State-of-the-art deep neural networks generalize well, despite being exceedingly overparameterized and trained without explicit regularization. Understanding the principles behind this phenomenon —termed benign overfitting or double descent— poses a new challenge to modern learning theory, as it contradicts classical statistical wisdom. Key questions include: What are the fundamental mechanics behind double descent? How do its features, such as the transition threshold and global minima, depend on the training data and on the algorithms used for training?
Increasing overparameterization can improve classification accuracy, but it comes with larger—and therefore slower and computationally more expensive—architectures that can be prohibitive in resource-constrained applications. Is overparameterization thus only relevant in training large networks? Or can it also be useful for training smaller models when combined with appropriate model-pruning techniques? What are the generalization dynamics of pruning overparameterized models?
Overparameterization leads to lower misclassification error, but what is its effect on fairness performance metrics, such as balanced error and equal opportunity? Can we design better loss functions, compared to standard losses such as cross entropy, which provably improve fairness performance of large models in the presence of label-imbalanced and/or group-sensitive datasets?
This talk will shed light on the questions raised above. At the heart of the results presented lies a powerful analysis framework for precise high-dimensional statistical analysis. This so-called convex Gaussian min-max theorem framework builds on Gordon’s Gaussian comparison inequality, and is rooted in the study of sharp phase transitions in compressed sensing.
RSVP here. (No need to RSVP if you already receive IAM seminar announcements by email.)