Lecture: 3 hours
Discussion: 1 hour
Essentials of using relational databases and SQL. Processing data in blocks. Scraping Web pages and using Web services/APIs. Basics of text mining. Interactive data visualization with Web technologies. Computational data workflow and best practices. Statistical methods.
Prerequisite: Course 141A
Students learn the concepts and gain experience in using fundamental technologies for data sciences. They learn to access data from new sources and how to convey results using rich technologies. They also learn to analyze text data in qualitatively different ways. They also see statistical methods not taught in other courses.
Summary of course contents:
This course focuses on fundamental concepts of data technologies that are widely used in data analysis. Students will work with the important technologies used to manage and access data both from traditional databases and Web APIs. They will learn to publish results in rich, interactive visualizations via modern Web technologies. Students are introduced to the important ideas of text mining and natural language processing and how to analyze textual data. The course also explores how to develop statistical software and tools for project management and reproducibility. We emphasize the rich ideas underlying these topics, along with gaining practical experience and reasoning about when to use the different technologies. Students will develop an understanding of the important data technologies, how to use these for data analysis, and how to learn about new technologies as they emerge. Understanding the key concepts of existing important technologies gives students the necessary foundation for life-long self-learning of emerging technologies.
- Introduction to Data Technologies, Murrell
- XML and Web Technologies for Data Sciences with R, Nolan and Temple Lang
- Interactive Data Visualization for the Web, Murray
The topics in this course overlap with the more specialized courses ECS 163 and ECS 165A/B. A key difference between this course and those is a) we focus on practical use of these technologies, rather than a deep understanding of them or implementing them, and accordingly b) we cover the material in 2 weeks rather than 10 weeks. The relational database component shares concepts with ECS 165A & B but covers only 1/5th of the material. The visualization component is a much briefer and differently focused treatment of material in ECS 163.
First offered Winter 2017.