Real-world projects from industry experts
With real-world projects and immersive content built in partnership with top-tier companies, you’ll master the tech skills companies want.
Learn how to wrangle data on a massive scale! By the end of this course, you’ll be able to pull data from a wide range of sources, store it in a database, and create data pipelines (ETL, NLP, machine learning) that power real-world web applications.
Get access to classroom immediately on enrollment
For many companies, data scientists who can also tackle data-engineering problems are worth their weight in gold. In this course, you’ll learn how to unlock data silos, pulling data from multiple sources and pipelining it into usable forms for analysts and top-level decision makers. At the end, you’ll even build an impressive machine-learning-powered web application that has real-world, life-saving significance.
Python, SQL, Statistics, Machine Learning.
Understand what ETL pipelines are and cccess and combine data from CSV, JSON, logs, APIs and databases.
Prepare text data for analysis with tokenization, lemmatization, and removing stop words. Use scikit-learn to transform and vectorize text data and build features with bag of words and tf-idf.
Understand the advantages of using machine learning pipelines to streamline the data preparation and modeling process. Use feature unions to perform steps in parallel and create more complex workflows and complete a case study to build a full machine learning pipeline that prepares data and creates a model for a dataset.
In this project, you’ll build a data pipeline to prepare the message data from major natural disasters around the world. You’ll build a machine learning pipeline to categorize emergency text messages based on the need communicated by the sender.
With real-world projects and immersive content built in partnership with top-tier companies, you’ll master the tech skills companies want.
On demand help. Receive instant help with your learning directly in the classroom. Stay on track and get unstuck.
Validate your understanding of concepts learned by checking the output and quality of your code in real-time.
Tailor a learning plan that fits your busy life. Learn at your own pace and reach your personal goals on the schedule that works best for you.
We provide services customized for your needs at every step of your learning journey to ensure your success.
project reviewers
projects reviewed
reviewer rating
avg project review turnaround time
Juno is the curriculum lead for the School of Data Science. She has been sharing her passion for data and teaching, building several courses at Udacity. As a data scientist, she built recommendation engines, computer vision and NLP models, and tools to analyze user behavior.
Andrew has an engineering degree from Yale, and has used his data science skills to build a jewelry business from the ground up. He has additionally created courses for Udacity’s Self-Driving Car Engineer Nanodegree program.
Arpan is a computer scientist with a PhD from North Carolina State University. He teaches at Georgia Tech (within the Masters in Computer Science program), and is a coauthor of the book Practical Graph Mining with R.
How to pull data, store it, and build ETL, NLP and machine-learning data pipelines with Python.
On average, successful students take 1 month to complete this program.
No. This Course accepts all applicants regardless of experience and specific background.
Machine Learning:
The Data Engineering for Data Scientists course is comprised of content and curriculum to support one project. We estimate that students can complete the program in 1 month.
The project will be reviewed by the Udacity reviewer network and platform. Feedback will be provided and if you do not pass the project, you will be asked to resubmit the project until it passes.
Access to this course runs for the length of time specified in the payment card above. If you do not graduate within that time period, you will continue learning with month to month payments. See the Terms of Use and FAQs for other policies regarding the terms of access to our programs.
Please see the Udacity Program Terms of Use and FAQs for policies on enrollment in our programs.
You’ll need access to the Internet, and a 64 bit computer. Additional software: need to be able to download and run Python 3.7.