Intermediate

Approx. {{courseState.expectedDuration}} {{courseState.expectedDurationUnit}}

Assumes 6hr/wk (work at your own pace)

Built by
Join {{11926 | number:0}} Students
view course trailer
View Trailer

Course Summary

This is the second course in the 3-course Machine Learning Series and is offered at Georgia Tech as CS7641. Taking this class here does not earn Georgia Tech credit.

Ever wonder how Netflix can predict what movies you'll like? Or how Amazon knows what you want to buy before you do? The answer can be found in Unsupervised Learning!

Closely related to pattern recognition, Unsupervised Learning is about analyzing data and looking for patterns. It is an extremely powerful tool for identifying structure in data. This course focuses on how you can use Unsupervised Learning approaches -- including randomized optimization, clustering, and feature selection and transformation -- to find structure in unlabeled data.

Series Information: Machine Learning is a graduate-level series of 3 courses, covering the area of Artificial Intelligence concerned with computer programs that modify and improve their performance through experiences.

If you are new to Machine Learning, we suggest you take these 3 courses in order.

The entire series is taught as an engaging dialogue between two eminent Machine Learning professors and friends: Professor Charles Isbell (Georgia Tech) and Professor Michael Littman (Brown University).

Why Take This Course?

You will learn about and practice a variety of Unsupervised Learning approaches, including: randomized optimization, clustering, feature selection and transformation, and information theory.

You will learn important Machine Learning methods, techniques and best practices, and will gain experience implementing them in this course through a hands-on final project in which you will be designing a movie recommendation system (just like Netflix!).

Prerequisites and Requirements

We recommend you take Machine Learning 1: Supervised Learning prior to taking this course.

This class will assume that you have programming experience as you will be expected to work with python libraries such as numpy and scikit. A good grasp of probability and statistics is also required. Udacity's Intro to Statistics, especially Lessons 8, 9 and 10, may be a useful refresher.

An introductory course like Udacity's Introduction to Artificial Intelligence also provides a helpful background for this course.

See the Technology Requirements for using Udacity.

What Will I Learn?

Projects

We will create a recommendation system like Netflix's to suggest movies to users.

Syllabus

Lesson 1: Randomized optimization

  • Optimization, randomized
  • Hill climbing
  • Random restart hill climbing
  • Simulated annealing
  • Annealing algorithm
  • Properties of simulated annealing
  • Genetic algorithms
  • GA skeleton
  • Crossover example
  • What have we learned
  • MIMIC
  • MIMIC: A probability model
  • MIMIC: Pseudo code
  • MIMIC: Estimating distributions
  • Finding dependency trees
  • Probability distribution

Lesson 2: Clustering

  • Clustering and expectation maximization
  • Basic clustering problem
  • Single linkage clustering (SLC)
  • Running time of SLC
  • Issues with SLC
  • K-means clustering
  • K-means in Euclidean space
  • K-means as optimization
  • Soft clustering
  • Maximum likelihood Gaussian
  • Expectation Maximization (EM)
  • Impossibility theorem

Lesson 3: Feature Selection

  • Algorithms
  • Filtering and Wrapping
  • Speed
  • Searching
  • Relevance
  • Relevance vs. Usefulness

Lesson 4: Feature Transformation

  • Feature Transformation
  • Words like Tesla
  • Principal Components Analysis
  • Independent Components Analysis
  • Cocktail Party Problem
  • Matrix
  • Alternatives

Lesson 5: Information Theory

  • History -Sending a Message
  • Expected size of the message
  • Information between two variables
  • Mutual information
  • Two Independent Coins
  • Two Dependent Coins
  • Kullback Leibler Divergence

Unsupervised Learning Project

Instructors & Partners

Charles Isbell is a Professor and Senior Associate Dean at the School of Interactive Computing at Georgia Tech. His research passion is artificial intelligence, particularly on building autonomous agents that must live and interact with large numbers of other intelligent agents, some of whom may be human. Lately, he has turned his energies toward adaptive modeling, especially activity discovery (as distinct from activity recognition), scalable coordination, and development environments that support the rapid prototyping of adaptive agents. He is developing adaptive programming languages, and trying to understand what it means to bring machine learning tools to non-expert authors, designers and developers. He sometimes interacts with the physical world through racquetball, weight-lifting and Ultimate Frisbee.

Michael Littman is a Professor of Computer Science at Brown University. He also teaches Udacity’s Algorithms course (CS215) on crunching social networks. Prior to joining Brown in 2012, he led the Rutgers Laboratory for Real-Life Reinforcement Learning (RL3) at Rutgers, where he served as the Computer Science Department Chair from 2009-2012. He is a Fellow of the Association for the Advancement of Artificial Intelligence (AAAI), served as program chair for AAAI's 2013 conference and the International Conference on Machine Learning in 2009, and received university-level teaching awards at both Duke and Rutgers. Charles Isbell taught him about racquetball, weight-lifting and Ultimate Frisbee, but he's not that great at any of them. He's pretty good at singing and juggling, though.

instructor photo

Pushkar Kolhe

Course Developer

Pushkar Kolhe is currently pursuing his PhD in Computer Science at Georgia Tech. He believes that Machine Learning is going to help him create AI that will reach the singularity. When he is not working on that problem, he is busy climbing, jumping or skiing on things.

track icon

View more courses in Software Engineering