STA 290 Seminar: Pragya Sur

seminar thumbnail

Event Date

Location
Mathematical Sciences Building 1147

Speaker: Pragya Sur, Assistant Professor, Department of Statistics, Harvard University

Title: Data Integration: Challenges and Opportunities for Interpolation Learning under Distribution Shifts

Abstract: Min-norm interpolators naturally arise as implicitly regularized limits of modern neural networks and other widely used algorithms. Recently, their out-of-distribution risk was studied when test samples are unavailable during training. However, in many applications, a limited amount of test data is typically accessible during training. The properties of min-norm interpolation in this setting remain poorly understood. In this talk, I will present a characterization of the risk associated with pooled min-L2-norm interpolation under both covariate and concept shifts. I will show that the pooled interpolator encompasses both early fusion and an intermediate form of fusion. Our results yield several important insights. For instance, in the presence of concept shift, incorporating additional data can actually harm prediction performance when the signal-to-noise ratio is low. Conversely, for higher signal-to-noise ratios, transfer learning is beneficial—provided the shift-to-signal ratio remains below a precise threshold, which I will define. Furthermore, under covariate shift, we find that heterogeneity between domains can improve prediction accuracy when the model is sufficiently overparameterized. To reach these conclusions, we develop novel anisotropic local laws, representing a significant advance in random matrix theory for heterogeneous data problems. Time permitting, I will also discuss applications of our results to the challenge of combining real data with synthetic data generated by AI models.

This is based on joint works with Anvit Garg, Kenny Gu, Yanke Song, and Sohom Bhattacharya.
 

Faculty website (links to Harvard): https://sites.harvard.edu/prs499/ 

Tags