Event Date
Event Date
Location
Mathematical Sciences Building 1147
Speaker: Naomi Saphra (Kempner Institute, Harvard University)
Title: Understanding Language Models by Understanding Training
Abstract: LMs work better than anyone could have predicted just five years ago. But when do they work—and when don’t they? How do they work—and how do they fail? Why do they work—and why do they misbehave? This last question—why?—cannot be answered only by inspecting trained LMs. We must understand the underlying factors that produce LM behavior, an understanding grounded in the training process. For a given architecture, training is a recipe with three ingredients: time, data, and luck. I will discuss these factors through controlled experiments inspecting and manipulating training. These experiments answer fundamental questions about why language models learn. How do training breakthroughs produce language competence? How can training data composition determine model capabilities? And when does output behavior depend on random initialization? Answering these questions, we can expose fundamental truths about why modern deep learning works so well, and even uncover the nature of reasoning itself.