Statistics Seminar: STA 290
Tuesday, November 27th, 2012 at 4.10pm, MSB 1147 (Colloquium Room)
Refreshments 3:30pm, prior to seminar in MSB 4110 (Statistics Lounge)
Speaker: Katherine Pollard (Gladstone Institutes & University of California, San Francisco)
Title: Quantifying taxonomic and functional diversity of metagenomes from next generation sequencing data
Abstract: Analysis of shotgun sequenced environmental DNA, known as metagenomics, promises insight into the taxonomic and functional composition of microbial communities. To overcome challenges associated with the fragmentary, non-overlapping nature of metagenomic sequence data, we developed novel statistical phylogenetic methods for de novo identification of operational taxonomic units (OTUs) and operational protein families (OPFs). Two key features of our approach are the use of probabilistic models of gene family evolution (e.g., profile hidden Markov models and stochastic context free grammars) and the generation of phylogenetic trees in which each leaf is a metagenomic sequencing read from a particular gene family. We also used a type of regression called niche modeling to estimate microbial community diversity (e.g., the total number or relative abundances of OTUs) on a global scale from very sparse sampling data. To test the performance of our methods, we developed a simulation pipeline and read-based error detection methods. With these tools, we identified novel bacteria and quantified the diversity of microbial communities from the world’s oceans and the human gut.