Udacity Logo
Log InJoin for Free

Data Engineering with AWS

Nanodegree Program

Learn to design data models, build data warehouses and data lakes, automate data pipelines, and work with massive datasets.

Learn to design data models, build data warehouses and data lakes, automate data pipelines, and work with massive datasets.

Intermediate

4 months

Real-world Projects

Completion Certificate

Last Updated January 30, 2024

Skills you'll learn:
Database fundamentals • Cassandradb • PostgreSQL • Database normalization
Prerequisites:
Relational data models • Command line interface basics • Intermediate Python

Courses In This Program

Course 1 45 minutes

Welcome to the Data Engineering with AWS Nanodegree Program

Welcome!

Lesson 1

An Introduction to Your Nanodegree Program

Welcome! We're so glad you're here. Join us in learning a bit more about what to expect and ways to succeed.

Lesson 2

Getting Help

You are starting a challenging but rewarding journey! Take 5 minutes to read how to get help with projects and content.

Course 2 4 weeks

Data Modeling

Learn to create relational and NoSQL data models to fit the diverse needs of data consumers. Use ETL to build databases in PostgreSQL and Apache Cassandra.

Lesson 1

Introduction to Data Modeling

In this lesson, students will learn the basic difference between relational and non-relational databases, and how each type of database fits the diverse needs of data consumers.

Lesson 2

Relational Data Models

In this lesson, students understand the purpose of data modeling, the strengths and weaknesses of relational databases, and create schemas and tables in Postgres

Lesson 3

NoSQL Data Models

Students will understand when to use non-relational databases based on the data business needs, their strengths and weaknesses, and how to creates tables in Apache Cassandra.

Lesson 4 • Project

Project: Data Modeling with Apache Cassandra

Students will model event data to create a non-relational database and ETL pipeline for a music streaming app. They will define queries and tables for a database built using Apache Cassandra.

Course 3 4 weeks

Cloud Data Warehouses

In this course, you’ll learn to create cloud-based data warehouses. You’ll sharpen your data warehousing skills, deepen your understanding of data infrastructure, and be introduced to data engineering on the cloud using Amazon Web Services (AWS).

Lesson 1

Introduction to Cloud Data Warehouses

Welcome to Cloud Data Warehouse with Amazon Web Services. In this lesson, you'll learn more about the course and set yourself up for success!

Lesson 2

Introduction to Data Warehouses

In this lesson, you'll be introduced to the business case for data warehouses as well as architecture, extracting, transforming, and loading data, data modeling, and data warehouse technologies.

Lesson 3

ELT and Data Warehouse Technology in the Cloud

In this lesson, you'll learn about ELT, the differences between ETL and ELT, and general cloud data warehouse technologies.

Lesson 4

AWS Data Warehouse Technologies

In this lesson, you'll learn about AWS Services and how to set up Amazon S3, IAM, VPC, EC2, and RDS. You'll build a Redshift data warehouse cluster and learn how to interact with it.

Lesson 5

Implementing a Data Warehouse on AWS

In this lesson, you'll learn to implement a data warehouse on AWS

Lesson 6 • Project

Project: Data Warehouse

In this project, you'll build an ETL pipeline that extracts data from S3, stages data in Redshift, and transforms data into a set of dimensional tables for an analytics team.

Course 4 4 weeks

Spark and Data Lakes

In this course, you will learn about the big data ecosystem and how to use Spark to work with massive datasets. You’ll also learn about how to store big data in a data lake and query it with Spark.

Lesson 1

Introduction to Spark and Data Lakes

In this course you'll learn how Spark evaluates code and uses distributed computing to process and transform data. You'll work in the big data ecosystem to build data lakes and data lake houses.

Lesson 2

Big Data Ecosystem, Data Lakes, and Spark

In this lesson, you will learn about the problems that Apache Spark is designed to solve. You'll also learn about the greater Big Data ecosystem and how Spark fits into it.

Lesson 3

Spark Essentials

In this lesson, we'll dive into how to use Spark for wrangling, filtering, and transforming distributed data with PySpark and Spark SQL

Lesson 4

Using Spark in AWS

In this lesson, you will learn to use Spark and work with data lakes with Amazon Web Services using S3, AWS Glue, and AWS Glue Studio.

Lesson 5

Ingesting and Organizing Data in a Lakehouse

In this lesson you'll work with Lakehouse zones. You will build and configure these zones in AWS.

Lesson 6 • Project

Project: STEDI Human Balance Analytics

In this project, you'll work with sensor data that trains a machine learning model. You'll load S3 JSON data from a data lake into Athena tables using Spark and AWS Glue.

Taught By The Best

Photo of Sean Murdock

Sean Murdock

Professor at Brigham Young University Idaho

Sean currently teaches cybersecurity and DevOps courses at Brigham Young University Idaho. He has been a software engineer for over 16 years. Some of the most exciting projects he has worked on involved data pipelines for DNA processing and vehicle telematics.

Photo of Matt Swaffer

Matt Swaffer

General Manager, MBS

Matt has been working in software development and data science for over 20 years. Matt's career is centered on the intersection of technology, data, and human psychology. He is passionate about using data science to have a meaningful impact on our people and our planet.

Photo of Ben Goldberg

Ben Goldberg

Staff Engineer at SpotHero

In his career as an engineer, Ben Goldberg has worked in fields ranging from computer vision to natural language processing. At SpotHero, he founded and built out their data engineering team, using Airflow as one of the key technologies.

Photo of Amanda Moran

Amanda Moran

Developer Advocate at DataStax

Amanda is a developer advocate for DataStax after spending the last 6 years as a software engineer on 4 different distributed databases. Her passion is bridging the gap between customers and engineering. She has degrees from the University of Washington and Santa Clara University.

Photo of Valerie Scarlata

Valerie Scarlata

Senior Technical Content Developer at Udacity

Valerie is a Sr. Technical Content Developer at Udacity who has developed and taught a broad range of computing curricula for multiple colleges and universities. She is a former professor and software engineer for over 10 years specializing in web, mobile, voice assistant, and full-stack application development.

Ratings & Reviews

Average Rating: 4.6 Stars

1,132 Reviews

Page 1 of 226

The Udacity Difference

Combine technology training for employees with industry experts, mentors, and projects, for critical thinking that pushes innovation. Our proven upskilling system goes after success—relentlessly.

Demonstrate proficiency with practical projects

Projects are based on real-world scenarios and challenges, allowing you to apply the skills you learn to practical situations, while giving you real hands-on experience.

  • Gain proven experience

  • Retain knowledge longer

  • Apply new skills immediately

Top-tier services to ensure learner success

Reviewers provide timely and constructive feedback on your project submissions, highlighting areas of improvement and offering practical tips to enhance your work.

  • Get help from subject matter experts

  • Learn industry best practices

  • Gain valuable insights and improve your skills

Unlock access to Data Engineering with AWS and the rest of our best-in-class catalog

  • Unlimited access to our top-rated courses

  • Real-world projects

  • Personalized project reviews

  • Program certificates

  • Proven career outcomes

Full Catalog Access

One subscription opens up this course and our entire catalog of projects and skills.

Month-To-Month

4 Months

Average time to complete a Nanodegree program

*Discount applies to the first 4 months of membership, after which plans are converted to month-to-month.

Your subscription also includes:

Get Started Today

Data Engineering with AWS

Month-To-Month


  • Unlimited access to our top-rated courses
  • Real-world projects
  • Personalized project reviews
  • Program certificates
  • Proven career outcomes

4 Months

Average time to complete a Nanodegree program

  • All the same great benefits in our month-to-month plan
  • Most cost-effective way to acquire a new set of skills
Discount applies to the first 4 months of membership, after which plans are converted to month-to-month.

Related Programs

About Data Engineering with AWS

Our Data Engineering Nanodegree program is a comprehensive data engineering course designed to teach you how to design data models, build data warehouses and data lakes, automate data pipelines, and work with massive datasets. Skills covered include Database fundamentals, CassandraDB, PostgreSQL, and database normalization. This program is ideal for those with a basic understanding of Python, SQL, and command-line interfaces. You'll learn from industry experts like Sean Murdock, Matt Swaffer, Ben Goldberg, Amanda Moran, and Valerie Scarlata, gaining hands-on experience with real-world projects. At Udacity, we offer an empowering learning environment where you gain practical skills through our data engineering training, reinforced with top-tier support and expert feedback. This course will equip you with the knowledge and tools to excel in the field of data engineering.

Udacity Logo
Company
  • Facebook
  • Twitter
  • LinkedIn
  • Instagram

© 2011-2024 Udacity, Inc. "Nanodegree" is a registered trademark of Udacity. © 2011-2024 Udacity, Inc.
We use cookies and other data collection technologies to provide the best experience for our customers.