Statistics Seminar: STA 290
Tuesday, February 5th, 2013 at 4:10pm, MSB 1147 (Colloquium Room)
Refreshments at 3:30pm in MSB 4110 (Statistics Lounge)
Speaker: Michael Kane Yale University
Title: "Subspace Ensembles for Scalable Statistical Analyses"
Abstract: Researchers have recently seen explosive growth in our ability to collect data. From sets of whole-genome sequences made up of tens of billions of base pairs, to entire corpuses of academic literature containing tens of millions of publications, to companies keeping data for hundreds of millions of users, researchers now have access to data on a scale never seen in human history. However, the tools for analyzing these data have not scaled with the growth in volume. Although these data may contain the key to some of the most compelling puzzles in scientific systems, researchers are often frustrated by the fact that they do not have the means to extract meaningful information from these data because of their sheer size. Approaches to these challenges have only recently begun to emerge through the confluence of the statistical and computational sciences. One of these approaches, which is comprised of techniques such as Chunking and Averaging, Bag of Little Bootstraps, and Weighted Random Subspace Methods, creates ensembles by aggregating estimators, each of which is trained on row or column subspace of the data matrix. This talk introduces subspace ensembles, a framework encompassing these current divide-conquer-aggregate techniques. This approach can be applied using a broad class of existing estimators while dramatically reducing computational complexity. Key theoretical results will be presented along with real data examples and future directions for research.