Approx. 2 months

Join thousands of students

Start Free Course

Start Free Course
You get
Instructor videos
Learn by doing exercises and view project instructions
view course trailer
Watch Video

Course Summary

The Introduction to Data Science class will survey the foundational topics in data science, namely:

  • Data Manipulation
  • Data Analysis with Statistics and Machine Learning
  • Data Communication with Information Visualization
  • Data at Scale -- Working with Big Data

The class will focus on breadth and present the topics briefly instead of focusing on a single topic in depth. This will give you the opportunity to sample and apply the basic techniques of data science.

This course is also a part of our Data Analyst Nanodegree.

Why Take This Course?

You will have an opportunity to work through a data science project end to end, from analyzing a dataset to visualizing and communicating your data analysis.

Through working on the class project, you will be exposed to and understand the skills that are needed to become a data scientist yourself.

Prerequisites and Requirements

The ideal students for this class are prepared individuals who have:

  • Strong interest in data science
  • Background in intro level statistics
  • Python programming experience
  • Or understanding of programming concepts such as variables, functions, loops, and basic python data structures like lists and dictionaries

If you need to brush up on your programming, we highly recommend Introduction to Computer Science: Building a Search Engine. If you need a refresher on statistics, enroll in Intro to Descriptive Statistics and Intro to Inferential Statisitics. All three are on Udacity!

See the Technology Requirements for using Udacity.

What Will I Learn?


Final Project

In this project you'll take time to reflect upon some interesting analysis and visualization of the NYC subway and weather data. You'll answer questions on why you picked certain tests and why you got a result.

P1: Predicting Boston Housing Prices

The Boston housing market is highly competitive, and you want to be the best real estate agent in the area. To compete with your peers, you decide to leverage a few basic machine learning concepts to assist you and a client with finding the best selling price for their home. Luckily, you’ve come across the Boston Housing dataset which contains aggregated data on various features for houses in Greater Boston communities, including the median value of homes for each of those areas. Your task is to build an optimal model based on a statistical analysis with the tools available. This model will then used to estimate the best selling price for your client’s home.


Lesson 1: Introduction to Data Science

  • Introduction to Data Science
  • What is a Data Scientist
  • Pi-Chaun (Data Scientist @ Google): What is Data Science?
  • Gabor (Data Scientist @ Twitter): What is Data Science?
  • Problems Solved by Data Science
  • Pandas
  • Dataframes
  • Create a New Dataframe

Lesson 2: Data Wrangling

  • What is Data Wrangling?
  • Acquiring Data
  • Common Data Formats
  • What are Relational Databases?
  • Aadhaar Data
  • Aadhaar Data and Relational Databases
  • Introduction to Databases Schemas
  • API’s
  • Data in JSON Format
  • How to Access an API efficiently
  • Missing Values
  • Easy Imputation
  • Impute using Linear Regression
  • Tip of the Imputation Iceberg

Lesson 3: Data Analysis

  • Statistical Rigor
  • Kurt (Data Scientist @ Twitter) - Why is Stats Useful?
  • Introduction to Normal Distribution
  • T Test
  • Welch T Test
  • Non-Parametric Tests
  • Non-Normal Data
  • Stats vs. Machine Learning
  • Different Types of Machine Learning
  • Prediction with Regression
  • Cost Function
  • How to Minimize Cost Function
  • Coefficients of Determination

Lesson 4: Data Visualization

  • Effective Information Visualization
  • Napoleon's March on Russia
  • Don (Principal Data Scientist @ AT&T): Communicating Findings
  • Rishiraj (Principal Data Scientist @ AT&T): Communicating Findings Well
  • Visual Encodings
  • Perception of Visual Cues
  • Plotting in Python
  • Data Scales
  • Visualizing Time Series Data

Lesson 5: MapReduce

  • Big Data and MapReduce
  • Basics of MapReduce
  • Mapper
  • Reducer
  • MapReduce with Aadhaar Data
  • MapReduce with Subway Data

Instructors & Partners

instructor photo

Dave Holtz


Dave Holtz is currently a data scientist at Airbnb. Before Airbnb, he was formerly a data science engineer at Yub, the world's first online-to-offline affiliate network, and he also worked as a product manager and data scientist at TrialPay. Dave holds an M.A. in physics and astronomy from the Johns Hopkins University and a B.A. in physics from Princeton University. In addition to data science, Dave is passionate about cosmology, smart cities, music, theater, and improv comedy.

instructor photo

Cheng-Han Lee


Cheng-Han worked as a program manager at Microsoft prior to Udacity, and he studied at the University of Texas at Austin and University of California at San Diego for his degrees in computer science.

Outside of work, Cheng-Han is a world traveler. He has lived in Taiwan, Shanghai, Charleston (SC), Dallas, Austin, San Diego, Seattle, and now the Bay Area. In addition to traveling, he likes to find new parks to explore, new venues to visit, and new restaurants to try.

track icon

View more courses in
Data Science