The availability of powerful models pre-trained on a vast corpus of data has spurred research on alternative training methods, and the overall goal of this talk is to give theoretical insights through the lens of high-dimensional regression. The first part considers knowledge distillation where one uses the output of a surrogate model as labels to […]