Introduction to Machine Learning

Thanks for joining us in this course! We're excited to teach you about machine learning, and we hope you're excited to learn.

Need help getting started?


Course Resources

Downloadable Materials

You can download Supplemental Materials, Lesson Videos and Transcripts from Downloadables (bottom right corner of the Classroom) or from the Dashboard (first option on the navigation bar on the left hand side).

Course Syllabus

Lesson 1: Naive Bayes Classifier

We start at the beginning: what is machine learning, and why should you care? Using the self-driving car as your classroom (literally), you'll be coding your own machine learning algorithms in just a few minutes, solving the problem of automagically knowing where it's safe to drive.  In the mini-project, you'll use a simple but powerful machine learning algorithm, the Naive Bayes classifier, to identify the author of an email based on the text of the message.
Lesson 1 Mini-Project.

Lesson 2: Support Vector Machines

Support vector machines are one of the workhorse algorithms in machine learning, and in this lesson you'll learn why.  You'll also start thinking about the all-important algorithm parameters in machine learning, which can take a mediocre algorithm and make it great (or vice versa). For the mini-project, the problem is once again to identify the author of an email, but now you'll try a more complex, sophisticated algorithm.  Along the way, you'll learn how to compare the advantages and disadvantages of certain types of machine learning algorithms.
Lesson 2 Mini-Project.

Lesson 3: Decision Trees

The third musketeer in our trio of supervised classifiers, decision trees are a great jumping-off point for more advanced learning on your own.  You'll get more practice with parameters, and also get comfortable with the ideas of bias and variance, two of the most important concepts in machine learning. You'll apply them to our author identification problem.
Lesson 3 Mini-Project

Lesson 4: Choose Your Own Algorithm

Now it's your turn to try out your chops as a new machine learner, and teach yourself one of three awesome algorithms.  You'll use them to classify desert terrain as drivable by as self-driving car--can you beat the benchmarks that we've set in the first three lessons?
Lesson 4 Mini-Project

Lesson 5: Datasets and Questions

Every machine learning project starts with two things: a great question, and a great dataset.  For the rest of the course, we'll be using data from the fraud-riddled Enron implosion to answer the question "Can patterns in email or financial data help us diagnose Enron executives who were persons of interest in the fraud investigation?" In this lesson and mini-project you'll learn more than you ever imagined about the greatest corporate fraud case in American history.
Lesson 5 Mini-Project

Lesson 6: Regressions

Another great classic of supervised learning, regressions are the go-to algorithm when you need to predict a continuous quantity rather than a discrete class label.  In the mini-project, you'll use a linear regression to continue exploring the Enron dataset.
Lesson 6 Mini-Project

Lesson 7: Outliers

Outliers can cause weird problems in machine learning algorithms, so this project will guide you through an algorithm to automatically find and remove them, and have you take a close look at the Enron data for outliers (they're in there, we promise!)
Lesson 7 Mini-Project

Lesson 8: Clustering

In this project, you'll get your hands dirty with unsupervised learning as you deploy the classic k-means clustering algorithm on the Enron data. Lesson 8 Mini-Project

Lesson 9: Feature Scaling

Many machine learning algorithms assume that the input features you use are scaled.  What does that mean, and how do you do it? Find out more in this lesson and mini-project, and practice applying scaling to the clusters that you made in Lesson 8.
Lesson 9 Mini-Project

Lesson 10: Text Learning

With so much of the world's information stored as text online or in printed material, text learning is a critical machine learning skill.  At the beginning of this class, you identified the author of an email by its message text; in this mini-project, you'll start with raw email data and go through the preprocessing required to do that project.
Lesson 10 Mini-Project

Lesson 11: Feature Selection

Your machine learning results will only be as good as the features you feed into your algorithm--the importance of smart feature selection cannot be overstated.  Luckily, there are some standard methods to help you engineer quality features.  In the mini-project, we'll lead you through a real-world example of identifying and eliminating features that messed up the email author classifier when we first wrote the decision tree mini-project.
Lesson 11 Mini-Project

Lesson 12: Principal Components Analysis

Principal Components Analysis is feature selection on steroids--it allows you to take the best parts of many features, and combine them into super-features that capture the patterns in a more streamlined way.  In the mini-project, we'll take a break from the Enron data and play with the famous eigenfaces example, where PCA is applied to automated face recognition.
Lesson 12 Mini-Project

Lesson 13: Validation

How do you know what kind of results you can expect from your machine learning analysis? You need validation to answer that question, and you'll tackle both the concepts and code that allow you to produce trustworthy work.  In the mini-project, you'll get started building a validation framework for your final project, an Enron fraud person-of-interest identification analysis.
Lesson 13 Mini-Project

Lesson 14: Evaluation Metrics

Last but not least: let's put some numbers onto our statements when we talk about "results" and "performance"! The mini-project has you practice deploying evaluation metrics on your Enron final project, getting yourself all set up to finish strong. Lesson 14 Mini-Project

Lesson 15: Tying It All Together

Whew, that was a lot that you just learned! This short lesson ties it all together, with a high-level view of the field of machine learning as a whole.

Final Project: Detecting Persons of Interest in the Enron Fraud Case

In the final project, you will get a chance to work with the machine learning tools you now know, applying them to a treasure trove of real email and financial data from Enron, looking for signs of fraud.  After the content of the lessons and the mini-projects, you'll be well-prepared to tackle this real-world (and tough!) problem.

Follow this link to access the final project. Find the final project evaluation rubric here. You should include answers to these questions, too.