Approx. {{courseState.expectedDuration}} {{courseState.expectedDurationUnit}}

Assumes 6hr/wk (work at your own pace)

Built by
Join {{292 | number:0}} Students
view course trailer
View Trailer

Course Summary

Data science plays an important role in many industries. In facing massive amount of heterogeneous data, scalable machine learning and data mining algorithms and systems become extremely important for data scientists. The growth of volume, complexity and speed in data drives the need for scalable data analytic algorithms and systems. In this course, we study such algorithms and systems in the context of healthcare applications.

In healthcare, large amounts of heterogeneous medical data have become available in various healthcare organizations (payers, providers, pharmaceuticals). This data could be an enabling resource for deriving insights for improving care delivery and reducing waste. The enormity and complexity of these datasets present great challenges in analyses and subsequent applications to a practical clinical environment.

Why Take This Course?

In this course, we introduce the characteristics of medical data and associated data mining challenges on dealing with such data. We cover various algorithms and systems for big data analytics. We focus on studying those big data techniques in the context of concrete healthcare analytic applications such as predictive modeling, computational phenotyping and patient similarity. We also study big data analytic technology:

Scalable machine learning algorithms such as online learning and fast similarity search;

Big data analytic system such as Hadoop family (Hive, Pig, HBase), Spark and Graph DB

Prerequisites and Requirements

Basic machine learning and data mining concepts such as classification and clustering;

Proficient programming and system skills in Python, Java and Scala;

Proficient knowledge and experience in dealing with data (recommended skills include SQL, NoSQL such as MongoDB).

See the Technology Requirements for using Udacity.

What Will I Learn?


Introduction to Big Data

Big Data Course Overview

Predictive Modeling

Classification Methods: Metrics

Ensemble Methods



Computational Phenotyping

Dimensionality Reduction/Tensor Factorization


Patient Similarity

Medical Ontology

Graph Analysis

Instructors & Partners

instructor photo

David Joyner

Course Developer

David Joyner completed his Ph.D. in Human-Centered Computing at Georgia Tech specializing in delivering automated feedback and assessment to students in exploratory learning environments. He joined Udacity to develop exercises, projects, and (one day!) entire courses that adapt to the learner's ability and progress.

track icon

View more courses in Georgia Tech Masters in CS