Skip to content

Intro to Hadoop and MapReduce

Free Course

How to Process Big Data

Related Nanodegree Program

Introduction to Programming

In collaboration with
  • Cloudera

About this course

The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing. Learn the fundamental principles behind it, and how you can use its power to make sense of your Big Data.

What you will learn

  1. Big Data
    • What is Big Data?
    • The problems big data creates.
    • How Apache Hadoop addresses these problems.
  2. HDFS and MapReduce
    • Discover how HDFS distributes data over multiple computers.
    • Learn how MapReduce enables analyzing datasets in parallel across multiple machines.
  3. MapReduce code
    • Write your own MapReduce code.
  4. MapReduce Design Patterns
    • Use common patterns for MapReduce programs to analyze Udacity forum data.

Prerequisites and requirements

Lesson 1 does not have technical prerequisites and is a good overview of Hadoop and MapReduce for managers.

To get the most out of the class, however, you need basic programming skills in Python on a level provided by introductory courses like our Introduction to Computer Science course.

To learn more about Hadoop, you can also check out the book Hadoop: The Definitive Guide.

See the Technology Requirements for using Udacity.

Why take this course?

  • How Hadoop fits into the world (recognize the problems it solves)
  • Understand the concepts of HDFS and MapReduce (find out how it solves the problems)
  • Write MapReduce programs (see how we solve the problems)
  • Practice solving problems on your own

Learn with the best.

  • Sarah Sproehnle
    Sarah Sproehnle

    Instructor

  • Ian Wrigley
    Ian Wrigley

    Instructor

  • Gundega Dekena
    Gundega Dekena

    Instructor