For #BacktoSkills month, we’re spotlighting a series of skills that supercharge careers. Last week we looked at Python, and this week we’re talking about the skill that’s driving the ‘data driven’ revolution: Data Engineering.

We’ve talked in the past about what a career in Data Engineering looks like. Professionals in Data Engineering roles architect and manage intricate data pipelines, and devise innovative solutions to extract, transform, and load data seamlessly. They live at the intersection of cutting-edge technology and business strategy, succeeding thanks to a number of core competencies including ​​meticulous attention to detail, problem-solving acumen, and an ability to collaborate effectively with cross-functional teams. It’s an enticing role for those who relish technical challenges and possess an unwavering commitment to staying ahead of the curve. 

Still not sure if Data Engineering is right for you? Perhaps the best way to really wrap your head around what it looks like to work in data engineering is, well, to do some data engineering work! All Udacity’s programs include hands-on projects, designed by industry professionals to reflect the real day-to-day work of a technical practitioner. This post highlights a few examples of projects from some of our most popular Data Engineering Nanodegree programs.

Build Disaster Response Pipelines with Figure Eight

Figure Eight, a company focused on creating datasets for AI applications, has crowdsourced the tagging and translation of messages to improve disaster relief efforts. In this project, you’ll build a data pipeline to prepare message data from major natural disasters around the world. You’ll build a machine learning pipeline to categorize emergency messages based on the needs communicated by the sender.

Found in the Data Scientist Nanodegree Program.

STEDI Human Balance Analytics

In this project, learners will act as a data engineer for the STEDI team to build a data lakehouse solution for sensor data that trains a machine learning model. They will build an ELT (Extract, Load, Transform) pipeline for lakehouse architecture, load data from an AWS S3 data lake, process the data into analytics tables using Spark and AWS Glue, and load them back into lakehouse architecture.

Found in the Data Engineering with AWS Nanodegree Program

Data Integration Pipelines for NYC Payroll Data Analytics

Analyze how the city’s financial resources are allocated and how much of the city’s budget is being devoted to overtime. Create high-quality data pipelines that are dynamic, so they can be automated and monitored for optimized efficiency. Build pipelines using Azure Data Factory for historical and new data to be processed in a NYC data warehouse in Azure Synapse Analytics.

Found in the Data Engineering with Microsoft Azure Nanodegree Program.

Build a Scalable Data Strategy

Once a product has been launched into the market, the amount of data collected typically dramatically increases, and requires the appropriate infrastructure to support such growth. In this project, you will act as a data product manager for Flyber, a flying-taxi service that has been massively successful in New York City after its first product launch, and create a data strategy to not only handle the massive amount of incoming data, but also process it to get the business insights needed to grow the business. You will define the data needs of primary business stakeholders within the organization and create a data model to ensure the data collected supports those needs. Then, you will perform the necessary extraction and transformation of the data to make the data relevant to answer business questions. Finally, you will interpret data visualizations to understand the scale of Flyber’s data growth and choose an appropriate data warehouse to enable that growth.

Found in the Data Product Manager Nanodegree Program.

Build an ML Pipeline for Short-term Rental Prices in NYC

Students will write a Machine Learning Pipeline to solve the following problem: a property management company is renting rooms and properties in New York for short periods on various rental platforms. They need to estimate the typical price for a given property based on the price of similar properties. The company receives new data in bulk every week, so the model needs to be retrained with the same cadence, necessitating a reusable pipeline. The students will write an end-to-end pipeline covering data fetching, validation, segregation, train and validation, test, and release. They will run it on an initial data sample, and then re-run it on a new data sample simulating a new data delivery.

Found in the Machine Learning DevOps Engineer Nanodegree Program.

Personalized Project Feedback

All of these projects are reviewed and assessed by experienced project reviewers who provide students with personalized feedback. If you’re on the path towards becoming a Data Engineer, becoming a better Data Engineer, or merely discovering if it’s a field you want to pursue further, these Nanodegree programs and projects will help you develop practical, job-ready skills.

Browse our full suite of courses that cover Data Engineering here.

Patrick Donovan
Patrick Donovan
Head of Consumer Marketing at Udacity