Statistics Seminar: STA 290
Thursday, January 17th, 2013 at 4.10pm, MSB 1147 (Colloquium Room)
Refreshments 3:30pm, prior to seminar in MSB 4110 (Statistics Lounge)
Speaker: Kevin Murphy, Google Inc.
Title: "Probabilistic Models for Learning from Big, Complex Data"
Abstract: There is currently much discussion in the media about 'big data'. Progress has been made in devising methods that can predict a single response variable given millions of possible input variables. However, in some settings (such as image tagging, text analysis, or biosequence analysis), we want to predict multiple output variables at the same time. This requires the use of multivariate statistical models, that capture the correlation between the multiple outputs. In this talk, I will give an overview of some methods I have developed to tackle these kinds of problems. The first approach is based on graphical models; the main challenge is how to efficiently learn the graph structure (which explicitly captures the correlations between the outputs) from data. The second approach is based on latent variable models; the main challenge is how to efficiently infer the latent factors (which implicitly capture the correlations between the outputs) given the observed data. In the last part of the talk, I will briefly present some recent work I have just started doing at Google where the variables we are trying to predict represent edges in a giant 'Knowledge Graph' (this is a probabilistic extension of the semantic web). For this, we are investigating prediction techniques based on graphical models and latent variable models.