STA 35B Statistical Data Science II

STA 035B


Goals:

1. Deepen knowledge of the R programming language.
2. Get familiar with more advanced plotting tools.
3. Develop statistical understanding and intuition on topics like regression, ANOVA, and nonparametrics.
4. Combine knowledge of R and statistics to analyze real data and learn how to interpret outcomes (in particular, develop awareness of the aspects that may limit the validity of outcomes).

Summary of course content:
1. Advanced data structures in R [emphasis on selecting the right data structure].
2. Writing simple functions in R [emphasis on modularizing codes].
3. Tools for handling and manipulating different types of data - tabular and spreadsheet data with mixed variable types; subsetting and vectorization operations [concept of tidy data].
4. Introduction to advanced plotting tools (such as ggplot), grammar of graphics, and principles of graphical integrity.
5. Concepts of correlation and regression [focus on lm() function in R].
6. Concepts of analysis of variance - one factor and two factor models with fixed effects [focus on use of lm() and aov() functions in R].
7. Basics of nonparametric procedures - permutation tests, rank and sign-based procedures.

For topics 5-7, emphasis will be placed on analyzing real-world data from natural and social sciences and on simulation experiments.

Illustrative Reading:
1. Bruce, P. and Bruce, A. (2017). Practical Statistics for Data Scientists: 50 Essential Concepts. O'Reilly Media.
2. Matloff, N. (2012). The Art of R Programming. No Starch Press.
3. Wickham, H. and Grolemund, G. (2017). R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O'Reilly Media.

Potential Overlap:
Some overlap with the materials of STA 141A, STA 106, STA 108. But the emphasis here is more on learning basic tools of manipulation, visualization of statistical summaries, and concepts of the statistical methodologies through computation and data analysis. There is some overlap with materials in PSC/SOC/POL 12Y.

History:
None