Deep learning systems have achieved remarkable performance across a wide range of tasks, from image classification and language translation to game playing. These successes are often attributed to the large number of parameters in modern networks, which typically places them in the overparameterized regime. Given this overparameterization, together with the diversity of training data and model architectures in use, one might expect each network to behave as a bespoke object, with little empirical regularity across tasks or architectures.
In this talk, I will present evidence to the contrary by describing a range of universal phenomena that arise in overparameterized neural networks trained to convergence. The discussion will focus on neural collapse, a particularly striking form of geometric simplicity that emerges in learned representations. I will also discuss recent mathematical models that aim to explain why these universal patterns arise as a consequence of the way we train modern neural networks.
Refreshments will be served before the talk, starting at 2:45.
Presented with support from the Pacific Institute of Mathematical Sciences.
