ADVISER: Fushing Hsieh
TITLE: Non-parametric Algorithmic Computational Methods for Longitudinal Network and Cryo-EM Images
The advent of high-throughput technologies and the concurrent advances in information sciences have led to an explosion in size and complexity of data sets, and the corresponding advantage of the technological improvement that allows us to approach the data set from different perspectives. We are at the threshold of an era in which hypothesis-driven science is being complemented with the data-driven study. This alternative way of the study usually requires developing nonparametric computational methods such as random walks and clustering to address the fundamental structure of data sets.
In the first part of my talk, I will introduce a study on a Cryo-electron microscope image data set. A series of non-parametric computing algorithms to extract the virus structural from highly noisy data are proposed. The noise is consisting of icy-glass background and electron vs. tissue temporal interactions under low-temperature environment. Due to such an extremely low signal-to-noise ratio structure, applications of many classic Fourier and smoothing based image processing methodologies turn out unsuccessful. We illustrate our algorithmic developments through a real image example, derived from Dr. Holland Cheng's lab in UC Davis. The whole development is divided into a series of steps. Within each step, one computational challenge is addressed and its resolution is proposed, including a new SVD format for extracting reliable image signals, a hierarchical segmentation approach for further separating signal vs. noise, and a regulated random walk algorithm for drawing virus outlines. Several virus image data are also used for validity and efficiency confirmation.
In the second part of my talk, a study on analyzing the geometry of data sets and networks is introduced. Given a wine-making data set with four different stages, we introduce a new method for detecting patterns within, and between datasets collected in a longitudinal study. Our goals are to characterize the relationships between wines and each of the feature spaces (chemical measurements) and to measure the coupling between the different phases during wine making. The method for extracting this information proceeds through two steps: First, we built a statistical model to capture the coupled geometry between nodes (wines) and all feature (chemical measurement) spaces. Second, we estimated the degree of association between the different phases of wine-making. This study is an extension of application based on Data Cloud Geometry (DCG), a methodology aimed to capture the underlying geometry of a set of data points. It is also an integrand part of the Data Mechanics approach, which captures the coupling geometry in bipartite networks.
ADVISER: Thomas Lee
TITLE: Generalized Fiducial Inference and its Applications
ABSTRACT: Fiducial inference was introduced by Ronald Fisher in 1920s-1930s. In particular he proposed the use of the fiducial distribution, in place of the Bayesian posterior distribution, for interval estimation of parameters. Fisher's proposal led to major disputes and discussions among the prominent statisticians of the 1930's, 40's and 50's.
In this talk, we present a modern revival of this old Fiducial idea, termed Generalized Fiducial Inference (GFI). We provide strong theoretical justifications for using GFI to construct interval estimates. We also illustrate its applicability by quantifying estimation uncertainties in the so-called ultra-high dimensional regression problem and a massive data computation problem.
ADVISER: Fushing Hsieh
TITLE: Ranking Methods: Goodness-of-fit for the Bradley-Terry Model and Cluster Models for Collaborative Filtering
In this talk we investigate how agents can be ranked in two situations. First, in the situation where agents, such as chess players or animals, pair off in decisive conflicts, the Bradley-Terry model is a popular method to produce a ranking of the competing agents. This model however, makes strict assumptions on the dominance structure of the agents. I propose a novel way to test if these assumptions likely hold true. In the second setting, the agents are taken to be items and the data is reviews from users for these items. Collaborative filtering is the standard approach for ranking items in this setting, and I investigate the performance of using a clustering model based on the newly developed Data Mechanics method developed by Fushing & Chen (2014) for this purpose