Important: This course is for students enrolled in a nanodegree. If you are not currently enrolled in a nanodegree, please access the full version of the Data Analysis with R course.

Data Analysis with R

Need help getting started?


Course Resources

Course Files

Take notes, write down your answers, and code alongside the instructors in RStudio using these files!

Data Sets

The data sets are listed in the order that they appear in the course.

R and Plotting Resources

General Plot Resources

General R Resources

These external resources can help you get started working with R.

R Help and Examples

R Basics

R Style Guide

ggplot2 Resources

Intro to Stats Using R

Additional Materials

Supplemental Materials

Additional Reading

The following texts are optional for the course. These texts can enhance your learning, but they are not required to succeed in this course.

Continue Learning

Course Syllabus

Lesson 1: What is EDA?

We'll start by learn about what exploratory data analysis (EDA) is and why it is important. You'll meet the amazing instructors for the course and find out about the course structure and final project.

Lesson 2: R Basics

EDA, which comes before formal hypothesis testing and modeling, makes use of visual methods to analyze and summarize data sets. R will be our tool for generating those visuals and conducting analyses. In this lesson, we will install RStudio and packages, learn the layout and basic commands of R, practice writing basic R scripts, and inspect data sets.

Lesson 3: Explore One Variable

We perform EDA to understand the distribution of a variable and to check for anomalies and outliers. Learn how to quantify and visualize individual variables within a data set as we begin to make sense of a pseudo-data set of Facebook users. While the data set does not contain real user data, it does contain a wealth of information. Through the lesson, we will create histograms and boxplots, transform variables, and examine tradeoffs in visualizations.

Lesson 4: Explore Two Variables

EDA allows us to identify the most important variables and relationships within a data set before building predictive models. In this lesson, we will learn techniques for exploring the relationship between any two variables in a data set. We'll create scatter plots, calculate correlations, and investigate conditional means.

Lesson 5: Explore Many Variables

Data sets can be complex. In this lesson, we will learn powerful methods and visualizations for examining relationships among multiple variables. We'll learn how to reshape data frames and how to use aesthetics like color and shape to uncover more information. Extending our knowledge of previous plots, we'll continue to build intuition around the Facebook data set and explore some new data sets as well.

Lesson 6: Diamonds and Price Predictions

Investigate the diamonds data set alongside Facebook Data Scientist, Solomon Messing. He'll recap many of the strategies covered in the course and show how predictive modeling can allow us to determine a good price for a diamond. As a final project, you will create your own exploratory data analysis on a data set of your choice.