Udacity Logo
Log InJoin for Free

Data Engineering


In data engineering for data scientists, you will practice building ETL, NLP, and machine learning pipelines. This will prepare you for the project with our industry partner Figure 8.

In data engineering for data scientists, you will practice building ETL, NLP, and machine learning pipelines. This will prepare you for the project with our industry partner Figure 8.

Built in collaboration with



1 month

Real-world Projects

Completion Certificate

Last Updated October 18, 2022

Skills you'll learn:
scikit-learn • Data cleaning • Machine learning pipeline creation • Part of speech tagging
Basic SQL • Python for data science • JSON

Course Lessons

Lesson 1

Introduction to Data Engineering

You will get an introduction to the data engineering for data scientists course and project. The lessons include ETL pipelines, natural language pipelines, and machine learning pipelines.

Lesson 2

ETL Pipelines

ETL stands for extract, transform, and load. This is the most common type of data pipeline, and you will practice each step in this lesson.

Lesson 3

NLP Pipelines

In order to complete the project at the end of the course, you will need some natural language processing skills. Here you will practice engineering machine learning features from text data.

Lesson 4

Machine Learning Pipelines

You'll use the Scikit-Learn package to code a machine learning pipeline. With these skills, you can ingest data, create features, and train a machine learning algorithm in just one step.

Lesson 5 • Project

Project: Disaster Response Pipeline

You’ll build a machine learning pipeline to categorize emergency messages based on the needs communicated by the sender.

Taught By The Best

Photo of Andrew Paster

Andrew Paster


Andrew has an engineering degree from Yale, and has used his data science skills to build a jewelry business from the ground up. He has additionally created courses for Udacity's Self-Driving Car Engineer Nanodegree program.

Photo of Juno Lee

Juno Lee

Curriculum Lead at Udacity

Juno is the curriculum lead for the School of Data Science. She has been sharing her passion for data and teaching, building several courses at Udacity. As a data scientist, she built recommendation engines, computer vision and NLP models, and tools to analyze user behavior.

Photo of Arpan Chakraborty

Arpan Chakraborty


Arpan is a computer scientist with a PhD from North Carolina State University. He teaches at Georgia Tech (within the Masters in Computer Science program), and is a coauthor of the book Practical Graph Mining with R.

The Udacity Difference

Combine technology training for employees with industry experts, mentors, and projects, for critical thinking that pushes innovation. Our proven upskilling system goes after success—relentlessly.

Demonstrate proficiency with practical projects

Projects are based on real-world scenarios and challenges, allowing you to apply the skills you learn to practical situations, while giving you real hands-on experience.

  • Gain proven experience

  • Retain knowledge longer

  • Apply new skills immediately

Top-tier services to ensure learner success

Reviewers provide timely and constructive feedback on your project submissions, highlighting areas of improvement and offering practical tips to enhance your work.

  • Get help from subject matter experts

  • Learn industry best practices

  • Gain valuable insights and improve your skills

Get Started Today

Data Engineering


  • Unlimited access to our learning catalog
  • Always-on learning assistant
  • Personalized project reviews
  • Program certificates
  • Learner community

4 Months

Average time to complete a Nanodegree program

  • All the same great benefits in our month-to-month plan
  • Most cost-effective way to acquire a new set of skills
Discount applies to the first 4 months of membership, after which plans are converted to month-to-month.