Data Engineering with Microsoft Azure

Introducing the Data Engineering with Microsoft Azure Nanodegree

Data engineering is the practice of designing and building systems for gathering, organizing, analyzing and processing data in various forms. Data engineers are known to have an analytical mind that can identify patterns and use them to make predictions. They are also responsible for designing databases and data structures. Data engineers work in many industries, including finance, retail marketing, and even healthcare.

With the Data Engineering with Microsoft Azure Nanodegree, you will learn to design data models, build data warehouses, build data lakes and lakehouse architecture, create data pipelines, and work with large datasets on the Azure platform using Azure Synapse Analytics, Azure Databricks, and Azure Data Factory.

COURSE 1: Data Modeling

Instructor: Amanda Moran

After spending the last 6 years as a software engineer on 4 different distributed databases, Amanda is a developer advocate for DataStax . Her passion is bridging the gap between customers and engineering. She has degrees from the University of Washington and Santa Clara University.

In this course, you’ll learn to create relational and NoSQL data models to fit the diverse needs of data consumers. You’ll understand the differences between different data models and how to choose the appropriate data model for a given situation. You’ll also build fluency in PostgreSQL and Apache Cassandra.

PROJECT 1: Data Modeling with Postgres

In this project, you’ll model user activity data for a music streaming app called Sparkify. You’ll create a relational database and ETL pipeline designed to optimize queries for understanding what songs users are listening to. In PostgreSQL, you will also define fact and dimension tables and insert data into your new tables.

PROJECT 2: Data Modeling with Apache Cassandra

In this project, you’ll model user activity data for a music streaming app called Sparkify. You’ll create a database and ETL pipeline in both Postgres and Apache Cassandra, designed to optimize queries for understanding what songs users are listening to. For PostgreSQL, you will also define fact and dimension tables and insert data into your new tables. For Apache Cassandra, you will model your data so you can run specific queries provided by the analytics team at Sparkify.

COURSE 2: Cloud Data Warehouses with Azure

Instructor: Matt Swaffer

Matt is a data science professional whose career spans software development, user experience design, and data visualization. He earned his PhD in cognitive psychology in human learning and is an adjunct professor teaching software design courses.

In this course, you will learn how to create cloud-based data warehouses and sharpen your data warehousing skills, deepen your knowledge of data infrastructure, and be introduced to data engineering on the cloud using Azure. You will start with an introduction to data warehouses and ETL, followed by an introduction to ELT and data warehouse technology in the cloud. After this you will learn about cloud data warehouse technology in Azure, including Azure Synaps Analytics.

PROJECT 3: Building an Azure Data Warehouse for Bikeshare Data Analytics

In this project, you’ll create a data warehouse solution using Azure Synaps Analytics to better understand Divvy, a bike-sharing program. You’ll start by importing data into Synapse Analytics, then transform the data into a star schema and view reports from Analytics to identify how much time and money is spent per ride.

COURSE 3: Data Lakes and Lakehouse with Spark and Azure Databricks

Instructor: Matt Swaffer

Matt is a data science professional whose career spans software development, user experience design, and data visualization. He earned his PhD in cognitive psychology in human learning and is an adjunct professor teaching software design courses.

In this course, you’ll learn about the big data ecosystem and how to use Spark to work with massive datasets. You will also store big data in a data lake and develop lakehouse architecture on the Azure Databricks platform. 

PROJECT 4: Building an Azure Data Warehouse for Bikeshare Data Analytics

In this project, you’ll create a data warehouse solution using Azure Synaps Analytics to better understand Divvy, a bike-sharing program. You’ll start by importing data into Synapse Analytics, then transform the data into a star schema and view reports from Analytics to identify how much time and money is spent per ride.

COURSE 4: Data Pipelines with Azure 

Instructor: Vishnu (Lucky) Pamula

Lucky is a data & AI evangelist with a track record of successfully helping organizations build analytics solutions. Besides his day job, he teaches as an adjunct professor, delivers lunch & learns, mentors students, and evangelizes Azure Quantum as an ambassador.

In this course, you’ll learn to build, orchestrate, automate and monitor data pipelines in Azure using Azure Data Factory and pipelines in Azure Synapse Analytics. You’ll build, trigger, and monitor data pipelines on the Azure platform for analytical workloads and run data transformations, optimize data flows, and work with data pipelines in production.

PROJECT 5: Data Integration Pipelines for NYC Payroll Data Analytics

The City of New York would like to develop a data analytics platform using Azure Synapse Analytics to analyze how the city’s financial resources are allocated and how much of the city’s budget is being devoted to overtime.

 You have been hired as a data engineer to create high-quality data pipelines that are dynamic, can be automated, and can be monitored for efficient operation. 

The source data resides in Azure Data Lake and you will build pipelines using Azure Data Factory for historical and new data to be processed in a NYC data warehouse in Azure Synapse Analytics. 

Start Your Data Engineering Career Today

With the Data Engineering with Microsoft Azure Nanodegree, you will be prepared to start a career in data engineering and make an impact in the field of big data.

START LEARNING

Eraina Ferguson
Eraina Ferguson
Eraina Ferguson is the Marketing and Communications Manager at Udacity. Her recent monologue, Listen to Her, was read by actress Marla Gibbs and featured at the WACO Theatre’s 50in50 event. Her writing has been featured on NBC Universal, Red Tricycle, LA Parents Magazine, and the LA Times. Eraina lives in California with her husband and children.