Linguistics/Statistics Seminar: Naomi Saphra

Event Date

Fri, Feb 28, 2025 @ 10:00am - 11:30am

Mathematical Sciences Building 1147

Speaker: Naomi Saphra (Kempner Institute, Harvard University)

Title: Understanding Language Models by Understanding Training

Abstract: LMs work better than anyone could have predicted just five years ago. But when do they work—and when don’t they? How do they work—and how do they fail? Why do they work—and why do they misbehave? This last question—why?—cannot be answered only by inspecting trained LMs. We must understand the underlying factors that produce LM behavior, an understanding grounded in the training process. For a given architecture, training is a recipe with three ingredients: time, data, and luck. I will discuss these factors through controlled experiments inspecting and manipulating training. These experiments answer fundamental questions about why language models learn. How do training breakthroughs produce language competence? How can training data composition determine model capabilities? And when does output behavior depend on random initialization? Answering these questions, we can expose fundamental truths about why modern deep learning works so well, and even uncover the nature of reasoning itself.

Linguistics/Statistics Seminar: Naomi Saphra

Event Date Fri, Feb 28, 2025 @ 10:00am - 11:30am

Tags

Event Date

Fri, Feb 28, 2025 @ 10:00am - 11:30am