Statistics Seminar: STA 290
Tuesday, November 13th, 2012 at 4.10pm, MSB 1147 (Colloquium Room)
Refreshments 3:30pm, prior to seminar in MSB 4110 (Statistics Lounge)
Speaker: Richard Olshen Stanford University
Title: Successive normalization/standardization of rectangular arrays
Abstract: When each subject in a study provides a vector of numbers/features for analysis, and one wants to standardize, then for each coordinate of the resulting rectangular array one may subtract the across subjects mean and divide by the across subjects standard deviation. Each feature then has mean 0 and standard deviation 1. Data from gene expression arrays and protein arrays often come as such rectangular arrays, where one coordinate (typically each column) denotes “subject” and the other some measure of “gene.” When analyzing these data one may ask that subjects and features “be on the same footing.” Thus, there may be a need to standardize across rows and columns of the matrix. We propose and investigate the convergence of one approach to successive standardization, which we learned from colleague Bradley Efron. Limit matrices exist (Lebesgue) almost surely and have row and column means 0, row and column standard deviations 1. We study implementation on simulated data and data like those that arose in cardiology. Exact rates of convergence can be computed. The procedure can be shown not to work with simultaneous standardization. Results make contact with previous work on successive, alternating conditional expectations and with large deviations for Lipschitz functions of Gaussian vectors. New insights regarding inference are enabled.
All efforts are joint with colleague Bala Rajaratnam.