Skip to content

Model Building and Validation

Free Course

Advanced Techniques for Analyzing Data

Related Nanodegree Program

Introduction to Programming

In collaboration with
  • AT&T

About this course

This course will teach you how to start from scratch in answering questions about the real world using data. Machine learning happens to be a small part of this process. The model building process involves setting up ways of collecting data, understanding and paying attention to what is important in the data to answer the questions you are asking, finding a statistical, mathematical or a simulation model to gain understanding and make predictions.

All of these things are equally important and model building is a crucial skill to acquire in every field of science. The process stays true to the scientific method, making what you learn through your models useful for gaining an understanding of whatever you are investigating as well as make predictions that hold true to test.

We will take you on a journey through building various models. This process involves asking questions, gathering and manipulating data, building models, and ultimately testing and evaluating them.

What you will learn

  1. Introduction to the QMV Process
    • Learn about the Question, Modeling, and Validation (QMV) process of data analysis.
    • Understand the basics behind each step.
    • Apply the QMV process to analyze on how Udacity employees choose candies!
  2. Question Phase
    • Learn how to turn a vague question into a statistical one that can be analyzed with statistics and machine learning.
    • Analyze a Twitter dataset and try to predict when a person will tweet next!
  3. Modeling Phase
    • Build rigorous mathematical, statistical, and machine learning models to make accurate predictions.
    • Look through the recently released U.S. medicare dataset for anomalous transactions.
  4. Validation Phase
    • Learn fundamental metrics to grade the performance of your models.
    • Analyze the AT&T connected cars data set.
    • See if you can tell the drivers apart by analyzing their driving patterns.
  5. Identify Hacking Attempts from Network Flow Logs
    • Create a program that examines log data and scores the likelihood that a brute force attack is taking place on a server.

Prerequisites and requirements

This is an advanced course, and the ideal students for this class are prepared individuals who have:

  1. Python programming knowledge, familiarity with python tools like Ipython Notebook and data analysis libraries like Scikit-learn, Scipy, and Pandas
  2. Knowledge of descriptive, inferential, and predictive statistics
  3. Knowledge of calculus, especially derivatives and integrals
  4. Knowledge of basic matrix algebra - matrices, vectors, determinant, identity matrix, multiplication, inverse
  5. Taken Intro to Machine learning and have understanding of common supervised learning and unsupervised learning algorithms, such as SVM and k-means clustering

See the Technology Requirements for using Udacity.

Why take this course?

Many of you may have already taken a course in machine learning or data science or are familiar with machine learning models.

In this course we will take a more general approach, walking through the questioning, modeling and validation steps of the model building process.

The goal is to get you to practice thinking in depth about a problem and coming up with your own solutions. Many examples we will attempt may not have one correct answer but will require you to work through the problems applying the methods we hope to illustrate throughout this class.

Learn with the best.

  • Don Dini
    Don Dini


  • Rishi Pravahan
    Rishi Pravahan