Subject: STA 220
Title: Data & Web Technologies for Data Analysis
Units: 4.0
School: College of Letters and Science LS
Department: Statistics STA
Effective Term: 2020 Winter Quarter
Learning Activities
- Lecture - 3.0 hours
- Discussion - 1.0 hours
Description
Essentials of using relational databases and SQL. Processing data in blocks. Scraping Web pages and using Web services/APIs. Basics of text mining. Interactive data visualization with Web technologies. Computational data workflow and best practices. Statistical Methods.
Expanded Course Description
Summary of Course Content:
This course focuses on fundamental concepts of data technologies that are widely used in data
analysis. Students will work with the important technologies used to manage and access data
both from traditional databases and Web APIs. They will learn to publish results in rich,
interactive visualizations via modern Web technologies. Students are introduced to the
important ideas of text mining and natural language processing and how to analyze textual data.
The course also explores how to develop statistical software and tools for project management
and reproducibility. We emphasize the rich ideas underlying these topics, along with gaining
practical experience and reasoning about when to use the different technologies. Students will
develop an understanding of the important data technologies, how to use these for data
analysis, and how to learn about new technologies as they emerge. Understanding the key
concepts of existing important technologies gives students the necessary foundation for life-long
self-learning of emerging technologies.
Illustrative Reading:
● Introduction to Data Technologies, Murrell
● XML and Web Technologies for Data Sciences with R, Nolan and Temple Lang
● Interactive Data Visualization for the Web, Murray
Potential Course Overlap:
The course covers the same general topics as STA 141B, but is tailored to students at the
graduate level. In particular, the basic topics in 141B such as data pre-processing and missing
data are de-emphasized --- so that more time can be devoted to advanced topics, such as
machine learning in distributed and interactive systems.
Due to the essential role of computing, the course also partially overlaps with the more
specialized courses ECS 163 and ECS 165A/B. The key differences are (1) that STA 220
focuses on practical use of these technologies rather than their internal details, and (2) STA 220
covers the overlapping material in 2 weeks rather than 10 weeks. The relational database
component shares concepts with ECS 165A & B but covers only 1/5th of the material. The
visualization component is a much briefer and differently focused treatment of material in ECS
163.
Final Exam:
Yes Final Exam