The Department of Statistics Research Experience for Undergraduates (REU) Program provides undergraduate students an opportunity get exposed to research in statistics by learning techniques for approaching scientific problems from a statistical point of view.
The REU is a two quarter program (Winter and Spring) where students work in small groups on a research project. The projects are supervised by faculty and graduate student mentors. At the end of the year, participants are required to present their projects to their peers. All participants are eligible to receive academic credit for STA 199.
Application Process
Step 1: Submit the online application form by October 19th.
- You will be able to select the project you are interested in participating in. Be sure to review the project prerequisites in the project description section below.
Step 2: Complete an interview.
- After an initial review of the applications, those selected to move forward in the application process will be asked to meet with project faculty mentors for an informal interview some time in November.
If you are selected, you will need to complete a Variable Unit Contract with the faculty mentor in order to register for STA 199 in Winter and Spring quarter.
2025-2026 Project List
- Hypothesis Testing for High-Dimensional and Non-Euclidean Data (Professor Hao Chen)
Brief description: Two-sample hypothesis testing is a fundamental problem in statistics, but it faces major challenges when the data are high-dimensional or non-Euclidean (e.g., network data). This project will investigate state-of-the-art methods and assess their performance on a variety of datasets, with the goal of developing deeper insights into complex data structures.
Prerequisites: STA 135, STA 131, STA 108, STA 104
Time commitment: about 10 hrs per week
Number of undergraduate participants: 2-3
- Analysis and prediction of mobile app usage using Hawkes processes (Professor Shizhe Chen)
Brief description: The Hawkes process is a point-process model for event times in which each event can temporarily increase or decrease the chance of future events. This makes it a natural tool for modeling recurrent event data such as mobile app usages. In this project we will analyze a dataset containing 599,636 mobile app usage records from 292 users. In particular, we will use Hawkes processes to (1) uncover relationships within and across apps (e.g., do some apps trigger the use of others?) and (2) predict future usage (e.g., the next app to open). Students will explore parametric Hawkes models, using exponential or power-law kernels estimated by maximum likelihood, to quantify how long and how strongly past events influence future ones. They will also explore neural Hawkes processes, which learn more flexible, non-linear patterns directly from data. Along the way, we will build clear visualizations of intensities over time and triggering effects between app categories, and we will benchmark performance against simpler baselines to check whether the added complexity improves prediction performance. The goal is to provide a practical, end-to-end understanding of how to model, visualize, and forecast complex event streams in an interpretable way.
Prerequisites: STA 141A, STA 135, STA 137Time commitment: 9-12 hours per week
Number of undergraduate participants: 1 - 4
- Bayesian Applications to Social Sciences (Professor Jairo Fuquene)
Brief description: In this project, students will apply techniques learned in STA 108 and 145 to real-world problems, with a focus on applications in the social sciences.
Prerequisites and possible other expectations: STA108 (completed), and STA145 (either completed or currently taking it),
Time commitment: 10 hours per week.
Number of undergraduate participants: 5
- Compare two BRFSS databases of two different years (Professor Fushing Hsieh)
Brief description: Trying to find out the evolution of chronic disease in the US society.
Prerequisites: Scientific curiosity and R-programming.
Number of undergraduate participants: 1 or 2.
Time commitment: 1 unit per quarter
- Compare two MLB pitchers' pitching dynamics (Professor Fushing Hsieh)
Brief description: Trying to find out fine scale differences in pitching.
Prerequisites: Scientific curiosity and R-programming.
Time commitment: 1 unit per quarter
Number of undergraduate participants: 1 or 2.
- Statistical analysis of post-training quantization (Professor Can Le)
Project description: Post-training quantization is a widely used technique for reducing storage requirements and speeding up inference in machine learning, including large language models (LLMs). Despite its practical success, the statistical effects of quantization on trained models are still not well understood. This project aims to address this gap by analyzing the statistical and computational trade-offs introduced by post-training quantization for shallow neural networks. In particular, we will investigate how uniform quantization, Hessian-weighted optimal quantization, and other related approaches affect implicit bias and generalization error. The results of this study may deepen our understanding of quantization and help to develop better quantization strategies.
Prerequisites: Highly motivated students with strong programming skills and a solid background in linear algebra and statistics
Time commitment: About 10 hours per week
Number of undergraduate participants: 2
- Geometry and topology in statistics - the Euler characteristic (Professor Wolfgang Polonik)
Project description: Understanding geometry and topology are of utmost importance when developing methodology for the analysis of complex data, either high-dimensional or non-Euclidean. One important notion in this context is the so-called Euler characteristic. The Euler characteristic can be defined based on observed data, and in this case it becomes a (random) quantity, describing geometric and topological features of the data cloud. This has successfully been used in machine learning and data analysis for feature extraction and other tasks. In this project we will develop a deeper understanding of the Euler characteristic, simulate the distribution of the Euler characteristic, and investigate a statistical goodness-of-fit test based on the Euler characteristic transform. Some basic readings include Paper 1 and Paper 2.
Prerequisites: Highly motivated students with good programming skills (Python), good background in linear algebra and strong interest in geometry.
Time commitment: About 10 hrs per week.
Number of undergraduate participants: 1- 3
Project Archive
2025-2026 Projects
- Comparison on Efficacy between Machine Learning Approximation and Statistical Methods
- Yutong Bao
- Faculty Mentor: Professor Jairo Fuquene Patino
- Change Point Detection Methods
- Kaitlyn Glenn
- Faculty Mentor: Professor Hao Chen
- BRFSS Covid Stringency Analysis
- Eric Goldman
- Faculty Mentor: Professor Fushing Hsieh
- Post-Training Quantization among Large Language Models
- Tianyang Liu, Venice Huyen Hodac, Hannah Wen
- Faculty Mentor: Professor Can Le
- A Taxonomic Approach to Categorical Data Analysis
- Avidane Ceana Caballero
- Faculty Mentor: Professor Fushing Hsieh
- Modeling Premolt Carapace Size in Female Dungeness Crabs using Robust Bayesian Regression Models
- Tina Yu
- Faculty Mentor: Professor Jairo Fuquene Patino
- Using Geometry and Topology for Statistical Shape Classification
- Sergio Ramirez, Joseph Matatyaou
- Faculty Mentor: Professor Wolfgang Polonik