STA / BST 290 Seminar Series
Monday, January 26, 3:30pm, MSB 3106 (Math Department, 3rd Floor)
Refreshments at 3:00pm in MSB 4110 (Statistics Lounge)
Speaker: Guang Cheng (Purdue University)
Title: Semi-Nonparametric Inference for Massive Data
Abstract: In this talk, we consider a partially linear framework for modelling (possibly heterogeneous) massive data. The major goal is to extract common features across all sub-populations while exploring heterogeneity of each sub-population. In particular, we propose an aggregation type estimator that possesses the (non-asymptotic) minimax optimal bound and asymptotic distribution as if there were no heterogeneity. Such a oracle result holds when the number of sub-populations does not grow too fast. A plug-in estimator for the heterogeneity parameter is further constructed, and shown to possess the asymptotic distribution as if the commonality information were available. We also test the heterogeneity among a large number of sub-populations. All the above results require to regularize each sub-estimation as though it had the entire sample size. Our general theory applies to the divide-and-conquer approach that is often used to deal with massive homogeneous data in a parallel computing environment. A technical by-product of this talk is the statistical inferences for the general kernel ridge regression. Extensive numerical results are also provided to back up our theory.