About this Course

The world is trending in real time! Learn from Twitter to scalably process tweets, or any big data stream, in real-time to drive d3 visualizations using Apache Storm, the “Hadoop of Real Time.” Storm is free, open source, and fun to use! Learn from Karthik Ramasamy, Technical Lead of Storm@Twitter, about the distributed, fault-tolerant, and flexible technology used to power Twitter’s real-time data flow pipeline. Twitter open sourced Storm in 2011, and it graduated to a top-level Apache project in September, 2014.

Starting from basic distributed concepts presented during our first Udacity-Twitter Storm Hackathon, link Storm concepts to Storm syntax to scalably drive Word Cloud visualizations with Vagrant, Ubuntu, Maven, Flask, Redis, and d3. Link to the public Twitter gardenhose stream to process live tweets, parse embedded URLs, and calculate Top worldwide hashtags. Extend beyond Storm basics by exploring multi-language capabilities in Python, integrate open source components, and implement real-time streaming joins.

In your final project, follow real-time trending topics by implementing the data pipeline to visualize only tweets that contain Top worldwide hashtags. Extend your project by exploring the Twitter API, or any data source, alongside Hackathon participants as they design their own ideas, receive feedback from Karthik, and open source a final project calculating real-time tweet sentiment and geolocation to drive a U.S. Map.

Course Cost
Free
Timeline
Approx. 2 weeks
Skill Level
Intermediate
Included in Course
  • Icon course 01 3edf6b45629a2e8f1b490e1fb1516899e98b3b30db721466e83b1a1c16e237b1 Rich Learning Content

  • Icon course 04 2edd94a12ef9e5f0ebe04f6c9f6ae2c89e5efba5fd0b703c60f65837f8b54430 Interactive Quizzes

  • Icon course 02 2d90171a3a467a7d4613c7c615f15093d7402c66f2cf9a5ab4bcf11a4958aa33 Taught by Industry Pros

  • Icon course 05 237542f88ede3178ac4845d4bebf431ddd36d9c3c35aedfbd92e148c1c7361c6 Self-Paced Learning

  • Icon course 03 142f0532acf4fa030d680f5cb3babed8007e9ac853d0a3bf731fa30a7869db3a Student Support Community

Join the Path to Greatness

This free course is your first step towards a new career with the Machine Learning Engineer Nanodegree Program.

Free Course

Real-Time Analytics with Apache Storm

by Twitter

Enhance your skill set and boost your hirability through innovative, independent learning.

Icon steps 54aa753742d05d598baf005f2bb1b5bb6339a7d544b84089a1eee6acd5a8543d

Course Leads

  • Karthik Ramasamy
    Karthik Ramasamy

    Instructor

  • Lewis Kaneshiro
    Lewis Kaneshiro

    Instructor

What You Will Learn

Lesson 1

Join instructor Karthik Ramasamy and the first Udacity-Twitter Storm Hackathon to cover the motivation and practice of real-time, distributed, fault-tolerant data processing. Dive into basic Storm Topologies by linking to a real-time d3 Word Cloud Visualization using Redis, Flask, and d3.

Lesson 2

Explore Storm basics by programming Bolts, linking Spouts, and finally connecting to the live Twitter API to process real-time tweets. Explore open source components by connecting a Rolling Count Bolt to your topology to visualize Rolling Top Tweeted Words.

Lesson 3

Go beyond Storm basics by exploring multi-language capabilities to download and parse real-time Tweeted URLs in Python using Beautiful Soup. Integrate complex open source bolts to calculate Top-N words to visualize real-time Top-N Hashtags. Finally, use stream grouping concepts to easily create streaming join to connect and dynamically process multiple streams.

Lesson 4

Work on your final project and we cover additional questions and topics brought up by Hackathon participants. Explore Vagrant, VirtualBox, Redis, Flask, and d3 further if you are interested!

Final Project: Construct a Storm Topology

Design a Storm Topology and new bolt that uses streaming joins to dynamically calculate Top-N Hashtags and display real-time tweets that contain trending Top Hashtags. Post your visualization to the forum and tweet them to your Twitter followers.

Project Extensions

Use additional features of the real-time Twitter sample stream or use any data source to drive your real-time d3 visualization.

Prerequisites and Requirements

Programming language required: Java

To be successful, you'll need intermediate knowledge of Java. Specifically, this is defined by experience and comfort with Java syntax, compile & run-time error diagnostics and debugging, ability to use javadocs as needed, and intermediate data structures including Arrays, HashMaps, and LinkedLists. If you need to build these skills, a good starting point is Udacity’s Introduction to Java with additional comfortability needed identifying and debugging compile & run-time errors.

No prior experience is assumed in Ubuntu, git, Maven, Redis, Flask (Python) or d3 (Javascript). Python is useful, but optional. A basic course such as CS101 or OO in Python would be helpful.

See the Technology Requirements for using Udacity.

Why Take This Course

Learn by doing! The world is going real time. Batch processing, popularized by Hadoop, has latency exceeding required real-time demands of modern mobile, connected, always-on users. Stream processing with seconds-required response time is necessary to meet this demand. Twitter is a world leader in real-time processing at scale. Learn the future from the company defining it.

What do I get?
  • Instructor videos
  • Learn by doing exercises
  • Taught by industry professionals

Thanks for your interest!

We'll be in touch soon.

Icon globe e82eae5d45465aba4fbe4bb746905ce55dc3324f310b79c60e4a20089057d347

Udacity 现已提供中文版本! A Udacity tem uma página em português para você! There's a local version of Udacity for you!

前往优达学城中文网站 Ir para a página brasileira Go to Indian Site or continue to Global Site