### PhD Dissertation Abstracts: 2006

## Statistics PhD Alumni 2006:

### Jimin Ding (2006)

ADVISER: Jane-Ling Wang

TITLE: **Joint Modelling of Survival and Longitudinal Data**

ABSTRACT: In clinical studies, longitudinal covariates are often used to monitor the progression of the disease as well as survival time. Relationship between a failure time process and some longitudinal covariates is of key interest and so is the understanding of the pattern of longitudinal process to learn more about health status of patients, or to get some insight into the progression of disease. Joint modeling of the longitudinal and survival data has certain advantages and emerged as an effective way to gain information from each other. Typically, a parametric longitudinal model is assumed to facilitate the likelihood approach. However, the choice of a proper parametric model turns out more illusive than standard longitudinal studies where no survival end-point occurs. Furthermore, the computational burden due to both Monte Carlo numerical integration and EM (Expected Maximum) algorithm is an important concern in the joint modelling setting.

To deal with those challenges, in the first part of the talk, I will propose several nonparametric longitudinal models in the joint modelling setting. Longitudinal process is represented by some basis functions and a proportional hazard model is then used to link them with the event-time.

Unknown model parameters are estimated through maximizing the observed joint likelihood, which are iteratively maximized by the Monte Carlo Expected Maximization (MCEM) algorithm. The simplicity of the model structure is crucial to have good numerical stability, and so the parsimonious nonparametric models have computational advantages and compare well to competing parametric longitudinal approaches. In the second part of the talk, I will introduce the method of sieves for joint modelling to illustrate the high dimensionality problem currently encountered in the joint modelling literature. The asymptotic properties of the proposed sieve estimator will be discussed.

### Joshua Kerr (2006)

ADVISERS: Robert Shumway / Wolfgang Polonik

TITLE: **Signal Extraction for Seismic Array Data Via Partially Linear Least-Squares**

ABSTRACT: Signal extraction is, and has been, a very important field for quite some time, and for good reason. Upon receiving a seismic reading at a site, the goal is to extricate the signal of interest from the noise-polluted reading attained. This, in and of itself, is a daunting task that has been grappled with. The task mentioned is further compounded when there are multiple signals of interest embedded within the noisy reading.

Background will be given followed by techniques to estimate how many signals are present, and estimate the velocity and azimuth of each. Asymptotics are developed to provide consistency and distributional results for the parameters of interest. Finally, a data example will be shown followed by some summary remarks.

Reference:

- Pollard, David and Radchenko, Peter. Nonlinear Least-Squares Estimation. Journal of Multivariate Analysis. In Press.
- R.H. Shumway. On Detecting a Signal in n Stationarily correlated Noise Series. Technometrics, 13:499-519, 1974.
- C.F. Wu. Asymptotic Theory of Nonlinear Least Squares Estimation. Annals of Statistics. 9:501-513, 1981.

### Shanmei Liao (2006)

ADVISER: Rudy Beran

TITLE: **Application of Bootstrap Confidence Region for multivariate analysis**

ABSTRACT: Bootstrap confidence regions are applied to two multivariate studies in this article. One is for population covariance matrices, where two problems are considered: 1) A set of bootstrap confidence regions generated for each component of a covariance matrix may not induce a confidence region of positive definite covariance matrices. 2) Besides controlling the overall coverage probability of the confidence region, it is desirable to keep equal the coverage probabilities of the individual confidence intervals that define the simultaneous region. Unconstrained parameterizations for covariance matrices are used to assure the positive definiteness of the covariance matrices estimators. Bootstrap simultaneous confidence regions are generated to balance the coverage probability of each component in the matrix. As an application, these confidence regions are used to test assumptions on the structure of a covariance matrix.

The second part is the application on camera calibration models. A simple and flexible model was given by Zhang's (1998) as a new technique to estimate intrinsic and extrinsic camera parameters, while the accuracy of these estimators has not been investigated in his paper. The numerical algorithms used in Zhang's procedure are refined and both parametric and nonparametric bootstrap methods are applied to obtain the simultaneous bootstrap confidence regions for parameters, in which way tests on these parameters can be operated.

### Nan Zhang (2006)

ADVISER: Hans-Georg Müller

TITLE: **Functional Data Analysis for Non-Gaussian Longitudinal Data**

ABSTRACT: We propose a nonparametric method to preform functional principal components analysis for the case of non-Gaussian longitudinal data, assuming the underlying process is hidden or unobservable.

In this framework, we deal with a sample of curves which give rise to noisy non-Gaussian repeated measurements, such as Poisson counts or Binomial data. The measurements for each subject are assumed to be determined by a subject-specific smooth random trajectory plus measurement errors. A link function relates subject-specific trajectories to an underlying latent Gaussian process and is modelled by an eigenfunction expansion with random coefficients. Basic elements of our approach are the estimation of the covariance structure and mean function of the latent Gaussian process, the estimation of the overdispersion parameter and the estimation of the variance of the measurement errors. The eigenfunction basis is estimated from the data, and functional principal component score estimates are obtained by maximizing the quasi-likelihood. A key step is the derivation of asymptotic consistency and distribution results under mild conditions, using tools from functional analysis. We develop a model selection technique, functional Akaike information criterion, to choose the number of principal components for the eigenfunction expansion.

The proposed framework is compared to other approaches, including Character Process Models, Cubic B-spline Models and functional principal components analysis through conditional expectation approach by simulation studies. Finally, the proposed approach is illustrated with French-Canadian fertility data, Medfly egg-laying data and Rats learning behavioral data.