## Statistics Seminar Series

Thursday, October 23, 4:10pm, MSB 1147 (Colloquium Room)

Refreshments at 3:30pm in MSB 4110 (Statistics Lounge)

Speaker: **Noureddine El Karoui** (UC Berkeley)

Title: “**On high-dimensional robust regression and inefficiency of maximum likelihood methods**”

Abstract: I will discuss the behavior of widely used statistical methods in the high-dimensional setting where the number of observations, n, and the number of predictors, p, are both large. I will present limit theorems about the behavior of the corresponding estimators, their asymptotic risks etc. The results apply not only to robust regression estimators, but also Lasso-type estimators and many much more complicated problems. Some of the results answer a question raised by Huber in his seminal '73 paper on robust regression. Many surprising statistical phenomena occur: for instance, maximum likelihood methods are shown to be (grossly) inefficient, and loss functions that should be used in regression are shown to depend on the ratio p/n. This means that dimensionality should be explicitly taken into account when performing simple tasks such as regression. More generally, we'll see that intuition based on results obtained in the small p, large n setting leads to misconceptions and the use of suboptimal procedures. It also turns out that inference is possible in this setting. We'll also see that the geometry of the design matrix plays a key role in these problems and use this fact to disprove claims of universality of some of the results. Mathematically, the tools needed mainly come from random matrix theory, measure concentration and convex analysis. Based on several papers, including some which are joint work with Derek Bean, Peter Bickel, Chingwhay Lim and Bin Yu.