*This is the first course in the 3-course Machine Learning Series and is offered at Georgia Tech as CS7641.*

*Please note that this is first course is different in structure compared to most Udacity CS courses. There is a final project at the end of the course, and there are no programming quizzes throughout this course.*

This course covers Supervised Learning, a machine learning task that makes it possible for your phone to recognize your voice, your email to filter spam, and for computers to learn a bunch of other cool stuff.

Supervised Learning is an important component of all kinds of technologies, from stopping credit card fraud, to finding faces in camera images, to recognizing spoken language. Our goal is to give you the skills that you need to understand these technologies and interpret their output, which is important for solving a range of data science problems. And for surviving a robot uprising.

**Series Information**: Machine Learning is a graduate-level series of 3 courses, covering the area of Artificial Intelligence concerned with computer programs that modify and improve their performance through experiences.

- Machine Learning 1: Supervised Learning (this course)
- Machine Learning 2: Unsupervised Learning
- Machine Learning 3: Reinforcement Learning

If you are new to Machine Learning, we suggest you take these 3 courses in order.

The entire series is taught as a lively and rigorous dialogue between two eminent Machine Learning professors and friends: Professor Charles Isbell (Georgia Tech) and Professor Michael Littman (Brown University).

In this course, you will gain an understanding of a variety of topics and methods in Supervised Learning. Like function approximation in general, Supervised Learning prompts you to make generalizations based on fundamental assumptions about the world.

**Michael**: So why wouldn't you call it "function induction?"**Charles**: Because someone said "supervised learning" first.

Topics covered in this course include: Decision trees, neural networks, instance-based learning, ensemble learning, computational learning theory, Bayesian learning, and many other fascinating machine learning concepts.

In your final project, you will explore important techniques in Supervised Learning, and apply your knowledge to analyze how algorithms behave under a variety of circumstances.

A strong familiarity with Probability Theory, Linear Algebra and Statistics is required. An understanding of Intro to Statistics, especially Lessons 8, 9 and 10, would be helpful.

Students should also have some experience in programming (perhaps through Introduction to CS) and a familiarity with Neural Networks (as covered in Introduction to Artificial Intelligence).

See the Technology Requirements for using Udacity.

- Definition of Machine Learning
- Supervised learning
- Induction and deduction
- Unsupervised learning
- Reinforcement learning

- Classification and Regression overview
- Classification learning
- Example: Dating
- Representation
- Decision trees learning
- Decision tree expressiveness
- ID3 algorithm
- ID3 bias
- Decision trees and continuous attributes

- Regression and function approximation
- Linear regression and best fit
- Order of polynomial
- Polynomial regression
- Cross validation

- Artificial neural networks
- Perceptron units
- XOR as perceptron network
- Perceptron training
- Gradient descent
- Comparison of learning rules
- Sigmoid function
- Optimizing weights
- Restriction bias
- Preference bias

- Instance based learning before
- Instance based learning now
- K-NN algorithm
- Won’t you compute my neighbors?
- Domain K-NNowledge
- K-NN bias
- Curse of dimensionality

- Ensemble learning: Boosting
- Ensemble learning algorithm
- Ensemble learning outputs
- Weak learning
- Boosting in code
- When D agrees

- Support Vector Machines
- Optimal separator
- SVMs: Linearly married
- Kernel methods

- Computational Learning Theory
- Learning theory
- Resources in Machine Learning
- Defining inductive learning
- Teacher with constrained queries
- Learner with constrained queries
- Learner with mistake bounds
- Version spaces
- PAC learning
- Epsilon exhausted
- Haussler theorem

- Infinite hypothesis spaces
- Power of a hypothesis space
- What does VC stand for?
- Internal training
- Linear separators
- The ring
- Polygons
- Sampling complexity
- VC of finite H

- Bayes Rule
- Bayesian learning
- Bayesian learning in action!
- Noisy data
- Best hypothesis
- Minimum description length
- Bayesian classification

- Joint distribution
- Adding attributes
- Conditional independence
- Belief networks
- Sampling from the joint distribution
- Recovering the joint distribution
- Inferencing rules
- Naïve Bayes
- Why Naïve Bayes is cool