Lesson 1
Data Streaming Nanodegree Program Introduction
Welcome to the Data Streaming Nanodegree Program
Nanodegree Program
Learn the latest skills to process data in real-time by building fluency in modern data engineering tools, such as Apache Spark, Kafka, Spark Streaming, and Kafka Streaming.
Learn the latest skills to process data in real-time by building fluency in modern data engineering tools, such as Apache Spark, Kafka, Spark Streaming, and Kafka Streaming.
Advanced
2 months
Real-world Projects
Completion Certificate
Last Updated December 29, 2023
Course 1 • 1 hour
Lesson 1
Welcome to the Data Streaming Nanodegree Program
Lesson 2
You are starting a challenging but rewarding journey! Take 5 minutes to read how to get help with projects and content.
Course 2 • 4 weeks
Learn to use REST Proxy, Kafka Connect, KSQL, and Faust Python Stream Processing and use it to stream public transit statuses using Kafka and Kafka ecosystem to build a stream processing application that shows the status of trains in real-time.
Lesson 1
In this lesson students will learn what data streaming is. Students will learn the pros and cons of data streaming, and how it compares to traditional data strategies.
Lesson 2
In this lesson we’ll review the architecture and configuration of Apache Kafka.
Lesson 3
This lesson covers data schemas and data schema management, with a focus on Apache Avro.
Lesson 4
This lesson covers producing and consuming data into Kafka with Kafka Connect and REST Proxy.
Lesson 5
Learn to build real-time applications that instantly process events, the concepts of stream processing state storage, windowed processing, and stateful and non-stateful stream processing.
Lesson 6
Students will learn how to use the Python stream processing library Faust to rapidly create powerful stream processing applications.
Lesson 7
Learn how to write simple SQL queries to turn Kafka topics into KSQL streams and tables, and then write those tables back out to Kafka.
Lesson 8 • Project
For your first project, you’ll be streaming public transit status using Kafka and the Kafka ecosystem to build a stream processing application that shows the status of trains in real-time.
Course 3 • 4 weeks
In this course you will grow your expertise in the components of streaming data systems, and build a real time analytics application. Specifically, you will be able to identify components of Spark Streaming (architecture and API), build a continuous application with Structured Streaming, consume and process data from Apache Kafka with Spark Structured Streaming (including setting up and running a Spark Cluster), create a DataFrame as an aggregation of source DataFrames, sink a composite DataFrame to Kafka, and visually inspect a data sink for accuracy.
Lesson 1
Introduction to Data Streaming with Spark
Lesson 2
In this lesson, you'll learn about working with Spark Dataframes and views.
Lesson 3
In this lesson, you'll learn how to work with JSON and complete Joins for data streaming.
Lesson 4
This lesson will focus on working with Redis, Base64, and JSON in Data Streaming.
Lesson 5 • Project
As your final project for this course, you will demonstrate the skills you have learned by evaluating human balance with spark streaming.
(Optional) Course 4 • 2 days
Lesson 1 • Project
Find your next job or connect with industry peers on LinkedIn. Ensure your profile attracts relevant leads that will grow your professional network.
Lesson 2 • Project
Other professionals are collaborating on GitHub and growing their network. Submit your profile to ensure your profile is on par with leaders in your field.
Professor at Brigham Young University Idaho
Sean currently teaches cybersecurity and DevOps courses at Brigham Young University Idaho. He has been a software engineer for over 16 years. Some of the most exciting projects he has worked on involved data pipelines for DNA processing and vehicle telematics.
Senior Data Engineer at Netflix
Judit is a Senior Data Engineer at Netflix. Formerly a Data Engineer at Split, where she worked on the statistical engine of their full-stack experimentation platform, she has also been an instructor at Insight Data Science, helping software engineers and academic coders transition to DE roles.
VP of Engineering at Insight
David is VP of Engineering at Insight where he enjoys breaking down difficult concepts and helping others learn data engineering. David has a PhD in Physics from UC Riverside.
Staff Engineer at SpotHero
In his career as an engineer, Ben Goldberg has worked in fields ranging from computer vision to natural language processing. At SpotHero, he founded and built out their data engineering team, using Airflow as one of the key technologies.
Average Rating: 4.4 Stars
127 Reviews
Combine technology training for employees with industry experts, mentors, and projects, for critical thinking that pushes innovation. Our proven upskilling system goes after success—relentlessly.
Demonstrate proficiency with practical projects
Projects are based on real-world scenarios and challenges, allowing you to apply the skills you learn to practical situations, while giving you real hands-on experience.
Gain proven experience
Retain knowledge longer
Apply new skills immediately
Top-tier services to ensure learner success
Reviewers provide timely and constructive feedback on your project submissions, highlighting areas of improvement and offering practical tips to enhance your work.
Get help from subject matter experts
Learn industry best practices
Gain valuable insights and improve your skills
Unlimited access to our top-rated courses
Real-world projects
Personalized project reviews
Program certificates
Proven career outcomes
Full Catalog Access
One subscription opens up this course and our entire catalog of projects and skills.
Average time to complete a Nanodegree program
Data Streaming