Udacity Logo
Log InSign Up

Automate Data Pipelines

Course

In this course, you'll build pipelines leveraging Airflow DAGs to organize your tasks along with AWS resources such as S3 and Redshift.

In this course, you'll build pipelines leveraging Airflow DAGs to organize your tasks along with AWS resources such as S3 and Redshift.

Skills

Apache Airflow

Data pipeline dags

Data pipeline partitioning

Amazon s3

Intermediate

4 weeks

Real-world Projects

Completion Certificate

Last Updated August 29, 2023

Prerequisites:

Data modeling basics

Intermediate Python

Course Lessons

Lesson 1

Introduction to Automating Data Pipelines

Welcome to Automating Data Pipelines. In this lesson, you'll be introduced to the topic, prerequisites for the course, and the environment and tools you'll be using to build data pipelines.

Lesson 2

Data Pipelines

In this lesson, you'll learn about the components of a data pipeline including Directed Acyclic Graphs (DAGs). You'll practice creating data pipelines with DAGs and Apache Airflow

Lesson 3

Airflow and AWS

This lesson creates connections between Airflow and AWS first by creating credentials, then copying S3 data, leveraging connections and hooks, and building S3 data to the Redshift DAG.

Lesson 4

Data Quality

Students will learn how to track data lineage and set up data pipeline schedules, partition data to optimize pipelines, investigating Data Quality issues, and write tests to ensure data quality.

Lesson 5

Production Data Pipelines

In this last lesson, students will learn how to build Pipelines with maintainability and reusability in mind. They will also learn about pipeline monitoring.

Lesson 6 • Project

Project: Data Pipelines

Students work on a music streaming company’s data infrastructure by creating and automating a set of data pipelines with Airflow, monitoring and debugging production pipelines

Taught By The Best

Photo of Sean Murdock

Sean Murdock

Professor at Brigham Young University Idaho

Sean currently teaches cybersecurity and DevOps courses at Brigham Young University Idaho. He has been a software engineer for over 16 years. Some of the most exciting projects he has worked on involved data pipelines for DNA processing and vehicle telematics.

Taught By The Best

Photo of Sean Murdock

Sean Murdock

Professor at Brigham Young University Idaho

Sean currently teaches cybersecurity and DevOps courses at Brigham Young University Idaho. He has been a software engineer for over 16 years. Some of the most exciting projects he has worked on involved data pipelines for DNA processing and vehicle telematics.

Get Started Today

Automate Data Pipelines