Learn Data Engineering with AWS

According to TechJury, people produced 2.5 quintillion bytes of data per day in 2021. Data has now become the lifeblood of digital transformation, and companies are scrambling to reinvent themselves as data-driven organizations. That’s why, according to Indeed and Glassdoor, the ratio of data engineer to data scientist job openings is roughly four-to-one. Companies can’t find enough data engineers to store, organize, and manage their ever-increasing amount of data.

Data engineers are responsible for making data accessible to all the people who use it across an organization. That could mean creating a data warehouse for the analytics team, building a data pipeline for a front-end application, or summarizing massive datasets to be more user-friendly.

Today, we are excited to announce a refresh of the Data Engineering with AWS Nanodegree program. Companies all over the world are looking for data engineers and our goal is to help anyone who wishes to land a job in the field do so.

Data Engineering with AWS Nanodegree program details.

The Data Engineering with AWS Nanodegree program will prepare you to be a data engineer, with special training in AWS data tools. Specifically, students will learn to:

Create user-friendly relational and NoSQL data models
Create scalable and efficient data warehouses
Work efficiently with massive datasets
Build and interact with a cloud-based data lake
Automate and monitor data pipelines
Develop proficiency in Spark, Airflow, and AWS tools

To get the most out of this program, it’s important to know how to program, specifically with intermediate knowledge of Python. Additionally, students should understand how to use the command line and have a solid foundation in SQL.

In as little as 4 months (at 5-10 hours a week), students who enroll in the Data Engineering with AWS Nanodegree program will learn how to model data with Apache Cassandra, interact with data warehouses (extracting and transforming data), and use data lakes (export, transform, and import data), and create custom data pipelines.

Changes from the previous data engineering program.

Since launching the Data Engineering Nanodegree program in 2019, there have been many changes to the AWS tools that are used for the course projects. While we love that tools are being constantly updated and improved, it meant that some of our lessons started to feel a little out of date.

In order to best meet the needs of our Nanodegree program students, we’ve updated our courses and projects to more closely reflect the current tools and state of data engineering. Below, you will find a list of the courses in the Data Engineering with AWS Nanodegree program with information on what has been changed.

Course 1: Data Modeling
All of the great information on data modeling is still available, but we removed a project called Data Modeling with Postgres that we felt was inconsistent with the other courses that only had one project.

Course 2: Cloud Data Warehouses
We updated the entirety of this course to reflect new AWS concepts and tools. This includes new visuals, videos, transcripts, and AWS instructions. We also updated the project guidance (see Project 2, below) to address gaps based on student feedback.

Course 3: Spark and Data Lakes
We made lots of updates to course 3 to more closely match industry standards around data engineering. Specific changes include updated content on the big data ecosystem, using a new version of Spark (and accompanying Spark scripts), adding concepts around Lakehouse Design Patterns, and starting to use AWS Glue and AWS Athena. For this course, we revamped the whole final project to focus more on the Internet of Things (IoT) and App Data which is curated for use by data scientists.

Course 4: Automate Data Pipelines
Most of the updates for course 4 are around tooling, in order to help students become familiar with tools that are current industry standards. For instance, course 4 will now use Airflow 2 instead of Airflow 1. We’ve also introduced Airflow Python Decorators, moved from Redshift Cluster to Redshift Serverless, and now have a student workspace that uses VS Code.

Data Engineering with AWS project info.

Project 1: Data Modeling with Apache Cassandra
Students will model event data to create a non-relational database and Extract, Load, Transform (ELT) pipelines for a fictitious music streaming app. The project will include defining queries and tables for a database built with Apacha Cassandra.

Project 2: Data Warehouse
Students will act as data engineers for a music streaming service. They will build an ELT pipeline that extracts data from S3, stage it in Redshift, and transform it into a set of dimensional tables for an analytics team to use.

Project 3: STEDI Human Balance Analytics
Students will build a data Lakehouse solution for sensor data that is used to train a machine learning model for a fictitious human balance company, STEDI. The project includes building an ELT pipeline for Lakehouse architecture, loading data from an AWS S3 data lake, and processing the data into analytics tables using Spark and AWS Glue, then finally loading them back into Lakehouse architecture.

Project 4: Data Pipelines with Airflow
Students will build high-grade data pipelines for Sparkify. First, students will take user data logs and metadata as JSON from S3 and process them in a data warehouse (Amazon Redshift). To complete the project, students must create their own custom operators to perform tasks such as staging the data, filling the data warehouse, and running final checks on data.

Learn from top data professionals.

To develop this program’s world-class curriculum, we collaborated with professionals from top-rated tech companies, like Amazon, DataStax, and SpotHero. Each of these collaborators contributed guidance and feedback to focus the program on the most in-demand skills. Each of the instructors has extensive data and teaching experience.

Instructors

Amanda Moran, Developer Advocate at DataStax
Ben Goldberg, Staff Engineer at SpotHero
Valerie Scarlata, Curriculum Manager at Udacity
Matt Swaffer, Solutions Architect and Adjunct Lecturer at the University of Northern Colorado
Sean Murdock, Software Engineer and Professor at Bringham You University, Idaho

Enroll in the Data Engineering with AWS Nanodegree program today.

If you’re an engineer interested in specializing in data engineering, or if you have some data engineering experience but want to learn all about the various AWS data tools, this is the program for you.

There’s never been a better time to get into data engineering. In fact, data engineering is among the fastest-growing roles in the tech industry. Plus, salaries for data engineers average well into the six-figure range.

With Udacity’s hands-on project-centric learning, there’s no better way to meet the demand than by registering today for the Data Engineering with AWS Nanodegree program. Enroll now to learn more!

START LEARNING