Learning in the Age of LLMs: Theoretical Insights into Knowledge Distillation and Test-Time-Training
–
The availability of powerful models pre-trained on a vast corpus of data has spurred research on alternative training methods, and the overall goal of this talk is to give theoretical insights through the lens of high-dimensional regression. The first part considers knowledge distillation where one uses the output of a surrogate model as labels to […]