Introduction to Data Science

Thank you for signing up for the course! We look forward to working with you and hearing your feedback in our forums.


Need help getting started?


Contents


Course Resources

Additional Reading

Downloadable Materials

  1. You can download Supplemental Materials, Lesson Videos, and Transcripts from Downloadables (in the bottom-right corner of the Classroom) or from the Dashboard (the first option on the navigation bar on the left hand side).

  2. You can download the following data sets to work on the programming assignments on your own computer:

    • Original data set — this contains the same data that you've been working with throughout the class
    • Improved data set — this version contains a cleaned-up subset of the original dataset with additional variables that you can use to improve your linear regression model and visualizations. The additional variables can be seen in this document
  3. If you're just looking for the subway and weather data file used in the project, you can find it here.

Course Syllabus

Lesson 1: Introduction

In this lesson you'll learn what data science is, what it means to be a data scientist, and get started with some hands-on exercises by jumping into some Python code to learn the Pandas library. You'll also start making and testing your own hypotheses by working with the Titanic survivors data set!

Lesson 2: Data Wrangling

In this lesson you'll learn how to clean up data that you obtain from various sources in order to make it useable in analysis.

Lesson 3: Data Analysis

In this lesson you'll learn various ways to analyze data, including parametric tests (such as Welch's t-test), nonparametric tests (such as the Mann-Whitney U test), and linear regression.

Lesson 4: Data Visualization

In this lesson you'll learn the importance of representing data in visual form. You'll learn about some of the common pitfalls in data visualization as well as study different methods of representation. You'll also look at one of the historical classics of data visualization, Minard's map of Napoleon's Russian campaign.

Lesson 5: MapReduce

In this lesson you'll be introduced to the MapReduce paradigm for data processing: how it works, and what sorts of problems it is useful for. (For a more detailed study of the MapReduce paradigm, take a look at our Intro to Hadoop and MapReduce course!)

Acknowledgements

Thanks to Dave Holtz and Cheng-Han Lee for their work in making this course!