Students should have basic programming experience in Python, such as being comfortable implementing loops, conditional statements, and knowing when to use an array over a list. Any students lacking this prerequisite knowledge can take the first four lessons in Intro to Computer Science to acquire the necessary Python experience.
Standard deviations, confidence intervals, z-scores, and t-tests
Project: Test a Perceptual Phenomenon
Design and implement your own hypothesis test for a version of the Stroop test
- Identify several statistical study methods and describe the positives and negatives of each
- Describe the variability in a sample or population using the range and standard deviation
- Convert distributions into the standard normal distribution using the Z-score
- Apply the concepts of probability and normalization to sample data sets
- Use confidence intervals to determine how accurately a sample of data represents a broader population
You’ll design and implement your own hypothesis test for your own version of the Stroop test. You’ll use statistical inference to draw a conclusion based on the results, summarizing your findings to give readers a good intuition for the data.
The Stroop test is one of the most popular tests in experimental psychology, and has been replicated over 700 times since it’s initial publication in 1929. At a high level, the test measures how interfering stimuli affect human reaction time.
Intro to Data Analysis
NumPy arrays, pandas DataFrames, and vectorized operations
Project: Investigating a Dataset
Pose your own question about a dataset, investigate its contents and communicate your findings
- Use NumPy arrays, pandas series, and vectorized operations to ease the data analysis process
- Use two-dimensional NumPy arrays and pandas DataFrames
- Understand how to group data and to combine data from multiple files
In this project, you’ll choose from two data sets: passenger and crew information from the Titanic or baseball statistics from 1871-2014.
You’ll then pose your own question about the dataset and apply each step of the data analysis process to investigate its contents and communicate your findings for others to learn from.
Data Extraction and Wrangling
SQL, MongoDB, and assess data quality
Project: OpenStreetMap Improvements
Clean some OpenStreetMap data for a part of the world that you care about
- Properly audit the validity, accuracy, completeness, consistency, and uniformity of a dataset
- Understand how data is modeled in relational (SQL) and document (MongoDB) databases
- Scrape websites for data you need and store it in a database
- Write your own queries to retrieve and summarize data from databases
OpenStreetMap is an open source collaborative project to create a free editable map of the world. It is used in different ways by Craigslist, Foursquare, World Bank, Red Cross, and many NGOs.
In this project, you’ll contribute your new data skills by helping to clean some OpenStreetMap data for a part of the world that you care about.
Exploratory Data Analysis
R, investigate datasets, reshape data frames
Project: Explore and Summarize Data
Demonstrate your mastery of EDA by exploring the variables, patterns, and oddities within a dataset
- Use R to perform exploratory data analysis (EDA)
- Understand the distribution of a variable and to check for anomalies and outliers
- Quantify and visualize individual variables within a dataset
- Examine and identify tradeoffs in different types of data visualizations
- Properly apply relevant techniques for exploring the relationship between any two variables in a data set
- Reshape data frames and use aesthetics like color and shape to examine relationships among multiple variables
In this project, you’ll choose from a variety of datasets ranging from wine quality to presidential campaign contributions and conduct your own exploratory data analysis (EDA).
You’ll demonstrate your mastery of EDA by exploring the variables, structure, patterns, oddities, and underlying relationships within the dataset. You’ll present your findings in an R Markdown file, and chronicle the process you took to explore the dataset so others can audit your conclusions.
Naive Bayes, Support Vector Machines, F1 scores
Project: Identify Fraud from Enron Email
Build an algorithm to identify Enron employees who may have committed fraud
- Use Naive Bayes, regression, and k-means clustering algorithms
- Implement Support Vector Machines (SVMs) to generate new features independently on the fly
- Implement decision trees
- Quantify machine learning results using precision, recall, and F1 score
Enron was one of biggest corporate scandals of the early 2000s. After revealing a sustained history of accounting fraud, the company eventually filed bankruptcy and saw many of its executives indicted.
In this project, you’ll play detective by building an algorithm for the public Enron financial and email dataset to identify Enron employees who may have committed fraud.
HTML, CSS, D3.js, dimple.js
Project: Storytelling with Data
Choose a dataset and use popular visualization libraries to create your own interactive visualizations
- Select appropriate visualization types for different insights
- Incorporate different narrative structures into your visualizations
- Incorporate animation and interaction to bring more audience insights into your visualizations using D3.js
Great data analysis tells a story through data, and the most impactful communication is often visual.
In this project, you’ll choose a dataset and use popular visualization libraries, dimple.js or D3.js, to create your own interactive visualizations.
Design an A/B Test (Optional)
Defining experimental groups and validating metrics
Project: Create an A/B Test
Analyze the results of an A/B test and recommend whether or not to launch the change
- Understand the key concepts and considerations when designing and conducting an A/B test
- Identify characteristics to consider when validating metrics
- Identify which users to bucket into the control and experimental groups
- Calculate the number of events and running time required to reach a statistical significant result
A/B experiments are frequently used by technology companies to test multiple versions of a website or mobile app and determine the best version to launch.
In this project, you’ll design an A/B test, including which metrics to measure and how long the test should be run. You’ll analyze the results of an A/B test that was run by Udacity and recommend whether or not to launch the change.