STA 141A Fundamentals of Statistical Data Science


Goals:
Students become proficient in data manipulation and exploratory data analysis, and finding and conveying features of interest. They learn to map mathematical descriptions of statistical procedures to code, decompose a problem into sub-tasks, and to create reusable functions. They develop ability to transform complex data as text into data structures amenable to analysis. They learn how and why to simulate random processes, and are introduced to statistical methods they do not see in other courses.

Summary of course contents:
This course provides an introduction to statistical computing and data manipulation. It enables students, often with little or no background in computer programming, to work with raw data and introduces them to computational reasoning and problem solving for data analysis and statistics. The high-level themes and topics include doing exploratory data analysis, visualizing data graphically, reading and transforming data in complex formats, performing simulations, which are all essential skills for students working with data. This course provides the foundations and practical skills for other statistical methods courses that make use of computing, and also subsequent statistical computing courses. Additionally, some statistical methods not taught in other courses are introduced in this course. The course will teach students to be able to map an overall statistical task into computer code and be able to conduct basic data analyses.

Illustrative reading:

  • R in a Nutshell, Adler.
  • The Art of R Programming, Matloff. R Graphics, Murrell.
  • R Graphics Cookbook, Chang.
  • ggplot2: Elegant Graphics for Data Analysis, Wickham

Potential Overlap:
This course overlaps significantly with the existing course 141 course which this course will replace. Course 242 is a more advanced statistical computing course that covers more material. ECS145 involves R programming. However, the focus of that course is very different, focusing on more fundamental computer science tasks and also comparing high-level scripting languages. R is used in many courses across campus. This course teaches the fundamentals of R and in more depth that is intentionally not done in these other courses. Furthermore, the combination of topics covered in this course (computational fundamentals, exploratory data analysis and visualization, and simulation) is unique to this course.

History:
First offered Fall 2016.  Replacement for course STA 141.