$150 /month

Intermediate
Join 7,559 Students

Approx. 2 months

Assumes 6hr/wk

(work at your own pace)

This Course is a Part of the

Course Summary

Exploratory Data Analysis (EDA) is an approach to data analysis for summarizing and visualizing the important characteristics of a data set. Promoted by John Tukey, EDA focuses on exploring data to understand the data’s underlying structure and variables, to develop intuition about the data set, consider how that data set came into existence, and decide how it can be investigated with more formal statistical methods.

If you're interested in supplemental reading material for the course check out the Exploratory Data Analysis book. (Not Required)

Why Take This Course?

You will...

  • Understand EDA as a journey and a way to explore data
  • Explore data at multiple levels using appropriate visualizations
  • Acquire statistical knowledge for summarizing data
  • Demonstrate curiosity and skepticism when performing EDA
  • Develop intuition around a data set and understand how the data was generated.

Pre-Requisites and Requirements

A background in statistics is helpful but not required. Consider taking Statistics: The Science of Decisions prior to taking this course. Relavant topics include:

  • Mean, median, mode
  • Normal, uniform, and skewed distributions
  • Histograms and box plots


Familiarity with the following CS and Math topics will help students:

  • Variable assignment
  • Comparison and logical operators ( <, >, <=, >=, ==, &, | )
  • If else statements
  • Square roots, logarithms, and exponentials

See the Technology Requirements for using Udacity

What Will I Learn

Projects

Jump into a new dataset. Explore, create plots, and summarize the fascinating things you find.

Syllabus

Lesson 1: What is EDA? (1 hour)

We'll start by learn about what exploratory data analysis (EDA) is and why it is important. You'll meet the amazing instructors for the course and find out about the course structure and final project.

Lesson 2: R Basics (3 hours)

EDA, which comes before formal hypothesis testing and modeling, makes use of visual methods to analyze and summarize data sets. R will be our tool for generating those visuals and conducting analyses. In this lesson, we will install RStudio and packages, learn the layout and basic commands of R, practice writing basic R scripts, and inspect data sets.

Lesson 3: Explore One Variable (4 hours)

We perform EDA to understand the distribution of a variable and to check for anomalies and outliers. Learn how to quantify and visualize individual variables within a data set as we begin to make sense of a pseudo-data set of Facebook users. While the data set does not contain real user data, it does contain a wealth of information. Through the lesson, we will create histograms and boxplots, transform variables, and examine tradeoffs in visualizations.

Problem Set 3 (2 hours)

Lesson 4: Explore Two Variables (4 hours)

EDA allows us to identify the most important variables and relationships within a data set before building predictive models. In this lesson, we will learn techniques for exploring the relationship between any two variables in a data set. We'll create scatter plots, calculate correlations, and investigate conditional means.

Problem Set 4 (2 hours)

Lesson 5: Explore Many Variables (4 hours)

Data sets can be complex. In this lesson, we will learn powerful methods and visualizations for examining relationships among multiple variables. We'll learn how to reshape data frames and how to use aesthetics like color and shape to uncover more information. Extending our knowledge of previous plots, we'll continue to build intuition around the Facebook data set and explore some new data sets as well.

Problem Set 5 (2 hours)

Lesson 6: Diamonds and Price Predictions (2 hours)

Investigate the diamonds data set alongside Facebook Data Scientist, Solomon Messing. He'll recap many of the strategies covered in the course and show how predictive modeling can allow us to determine a good price for a diamond. As a final project, you will create your own exploratory data analysis on a data set of your choice.

Final Project (10+ hours)

You've explored simulated Facebook user data and the diamonds data set. Now, it's your turn to conduct your own exploratory data analysis. Choose one data set to explore (one provided by Udacity or your own) and create a RMD file that uncovers the patterns, anomalies and relationships of the data set.

Instructors & Partners

instructor photo

Dean Eckles

Instructor

Dean Eckles is a social scientist, statistician, and member of the Data Science team at Facebook. His primary focus is how interactive technologies affect human behavior by mediating, amplifying, and directing social influence — and the statistical methods to study these processes. Dean holds degrees from Stanford University in philosophy (BA), cognitive science (BS, MS), and statistics (MS), and communication (PhD).

instructor photo

Moira Burke

Instructor

Moira Burke is a Data Scientist at Facebook, where she combines her social psychology and data munging chops to understand how people perceive their audience online and how various uses of the site improve psychological well-being. She received her Ph.D. in Human-Computer Interaction from Carnegie Mellon University, and a B.A. in Computer Science from the University of Oregon. When not coaxing data into pretty plots, she sings a cappella. Badly.

instructor photo

Chris Saden

Instructor

After graduating from Emory in 2008, Chris dabbled in college admissions for a year, which led him to teach high school mathematics in Oakland. He radiates a love for learning and believes everyone deserves a great education. In 2012, Chris joined Udacity to reach thousands of students and share his joy of problem solving with the world.

instructor photo

Solomon Messing

Instructor

Solomon Messing is a political scientist with Facebook's Data Science Team. His research and teaching focus on political advertising and campaigns, social influence, and design and analysis of experiments. His work has appeared in the American Political Science Review, Public Opinion Quarterly, and Communication Research. Solomon completed his Ph.D. in political communication and M.S. in Statistics at Stanford.

Ways to Take This Course

Full Course

  • Courseware
  • Projects with ongoing feedback and code-review
  • Guidance from Coaches
  • Verified Certificates

Courseware

  • View the course "Textbook" by watching lectures and taking auto- graded quizzes. Learn at your own pace. 100% free.
14 day money back guarantee. Love it or get a full refund.
track icon

View more courses in the Data Science Track