Real-Time Analytics with Apache Storm

Thank you for signing up for the course! We look forward to working with you and hearing your feedback in our forums.

Need help getting started?


Course Resources

Additional Reading

  1. The Apache Software Foundation Announces Apache™ Storm™ as a Top-Level Project

  2. The Happy Demise of the 10X Engineer

  3. History of Apache Storm and lessons learned

  4. MIT Tech Review: Twitter Firehose Reveals How New York City Sleeps 3 Hours Later on Sundays

Additional Documentation

  1. Apache Storm Official Site

  2. Vagrant Official Site

  3. Oracle VirtualBox Official Site

  4. Twitter 4J API

  5. Flask Web Microframework

  6. Redis Key-Value Database

  7. Data Driven Documents (D3) Selections by Michael Bostock

  8. D3 Blocks: Open Source Examples

Downloadable Materials

You can download Supplemental Materials, Lesson Videos and Transcripts from Downloadables (bottom right corner of the Classroom) or from the Dashboard (first option on the navigation bar on the left hand side).

Course Syllabus

Lesson 1: Theory, Setup and Basic Storm

Presented by Twitter's Karthik Ramasamy, learn the theory and motivation of Storm, including Stream vs. Batch processing, and real-time Big Data lessons learned at Twitter.  Continue on linking Storm Concepts to Storm Syntax with basic setup using Vagrant and VirtualBox to explore Storm Topologies to drive real-time visualizations using d3 (Data Driven Documents).

Lesson 2: Storm with Twitter Streams

Program the basics of Storm, including Bolts, Spouts, and basic Topologies.  Obtain Twitter OAuth credentials to link to the real-time Twitter Sample Stream to drive Word Cloud visualizations using d3.

Lesson 3: Beyond Storm Basics

Move beyond Storm basics with intermediate concepts exploring multi-language capabilities using Python, open source bolts to calculate Top-N hashtags, and streaming joins to dynamically process tuples from different sources.

Lesson 4: Storm Final Project and Hackathon

Design a Storm Topology and implement a new bolt that uses streaming joins to dynamically calculate Top-N Hashtags and display real-time tweets that contain trending Top Hashtags.  Work alongside our Udacity-Twitter Hackathon participants as their final project questions are fielded by Karthik.  Post your visualization to the forum and tweet them to your Twitter followers.  Extend your project to use additional features of the real-time Twitter sample stream or use any data source to drive your real-time d3 visualization.


Special thanks to Karthik Ramasamy, Chris Kellogg, and Vikas Rameshkedigehalli of Twitter, along with Jorge Herrera and Hyung-Suk Kim of Stanford University for help and support throughout this course.