**Important**: This course is for students enrolled in a nanodegree. If you are not currently enrolled in a nanodegree, please access the full version of the Data Analysis with R course.

**Need help getting started?**

- Find an answer to your question in the Udacity FAQ
- Learn about different parts of the Classroom in Udacity Introduction

Contents

- 1 Course Resources
- 1.1 Course Files
- 1.2 Data Sets
- 1.3 R and Plotting Resources
- 1.3.1 General Plot Resources
- 1.3.2 General R Resources
- 1.3.3 R Help and Examples
- 1.3.4 R Basics
- 1.3.5 R Style Guide
- 1.3.6 ggplot2 Resources
- 1.3.7 Intro to Stats Using R

- 1.4 Additional Materials
- 1.4.1 Supplemental Materials
- 1.4.2 Additional Reading
- 1.4.3 Continue Learning

- 2 Course Syllabus

Take notes, write down your answers, and code alongside the instructors in RStudio using these files!

- Zip File of All Course Materials
- Lesson 3 RMD File
- Lesson 4 RMD File
- Lesson 5 RMD File
- Lesson 6 RMD File

The data sets are listed in the order that they appear in the course.

- State Data (Lesson 2)
- Reddit Survey Data (Lesson 2)
- Pseudo-Facebook Data (Lesson 3)
- Yogurt Data (Lesson 5)
- Micro-array/Gene Expression Data (Lesson 5)

- How do I read Histograms and use them in R? (blog post) (note the link above uses the base graphics package of R to create histograms; we'll be using the ggplot2 graphics package)
- How do I read and use a Boxplot? (blog post)
- How do I format plots for publications? (blog post)

These external resources can help you get started working with R.

- Titanic: Getting Started with R by Trevor Stephens (4 part - blog series)
- R Video Tutorials by Google Developers (short videos on YouTube)
- R Two Minute Tutorials by Anthony Damico (website)
- Debugging with RStudio (website)
- Import Data into R (blog post)
- Sample random rows from a data frame (StackoverFlow Post)

- R Style Guide by Hadley Wickham (pdf)
- Google's R Style Guide (website)

- A Simple Intro to the Graphing Philosophy of ggplot2 by Tom Hopper (blog post)
- Grammar of Graphics by Hadley Wickham (pdf)
- Grammar of Graphics: Past, Present, and Future by Hadley Wickham (pdf)
- ggplot2 official documentation (website)
- ggplot2 tutorial by Ramon Saccilotto (pdf)
- ggplot2: Cheatsheet for Visualizations (website)
- ggplot2: Scales and Themes (website)
- ggplot2: geom quick reference (website)
- My Commonly Done ggplot2 Graphs (R Bloggers)

- Open Intro to Statistics Labs (website)

- Create a Heatmap using the Base Graphics (blog post)
- Converting Between Long and Wide Format (website)
- Melt Data Frames (blog post)
- Predict Movie Ratings using IMDB and R (blog post)
- A Visual Guide to Correlation (jpg)

The following texts are **optional** for the course. These texts can enhance your learning, but they are not required to succeed in this course.

- Exploratory Data Analysis by John Tukey
- Visualizing Data by William S. Cleveland
- ggplot2: Elegant Graphics for Data Analysis by Hadley Wickham

- The R Meta Book, which is a collection of links and resources by Joseph Rickham (
**GREAT**blog post) - Fitting and Interpreting Linear Models in R (blog post)
- Analyze your social network on Facebook using R (blog post)
- Multivariate Display about Movies: Genres (blog post)
- Multivariate Display about Movies: Comparing Movie Sequels to their Originals (blog post)
- Predict Movie Ratings using your IMDB Data and R (blog post)
- Read Large Data Sets:The Iterator Package by Flavio Barros (blog post)
- Where can I find large data sets that are open to the public? (Quora post)
- Write a R Package from Scratch (blog post tutorial)

We'll start by learn about what exploratory data analysis (EDA) is and why it is important. You'll meet the amazing instructors for the course and find out about the course structure and final project.

EDA, which comes before formal hypothesis testing and modeling, makes use of visual methods to analyze and summarize data sets. R will be our tool for generating those visuals and conducting analyses. In this lesson, we will install RStudio and packages, learn the layout and basic commands of R, practice writing basic R scripts, and inspect data sets.

We perform EDA to understand the distribution of a variable and to check for anomalies and outliers. Learn how to quantify and visualize individual variables within a data set as we begin to make sense of a pseudo-data set of Facebook users. While the data set does not contain real user data, it does contain a wealth of information. Through the lesson, we will create histograms and boxplots, transform variables, and examine tradeoffs in visualizations.

EDA allows us to identify the most important variables and relationships within a data set before building predictive models. In this lesson, we will learn techniques for exploring the relationship between any two variables in a data set. We'll create scatter plots, calculate correlations, and investigate conditional means.

Data sets can be complex. In this lesson, we will learn powerful methods and visualizations for examining relationships among multiple variables. We'll learn how to reshape data frames and how to use aesthetics like color and shape to uncover more information. Extending our knowledge of previous plots, we'll continue to build intuition around the Facebook data set and explore some new data sets as well.

Investigate the diamonds data set alongside Facebook Data Scientist, Solomon Messing. He'll recap many of the strategies covered in the course and show how predictive modeling can allow us to determine a good price for a diamond. As a final project, you will create your own exploratory data analysis on a data set of your choice.