Course

Data Engineering

Learn how to wrangle data on a massive scale! By the end of this course, you’ll be able to pull data from a wide range of sources, store it in a database, and create data pipelines (ETL, NLP, machine learning) that power real-world web applications.
Enroll Now
  • DAYS
  • HRS
  • MIN
  • SEC
  • Estimated Time
    28 hours

  • Enroll by
    September 22, 2021

    Get access to classroom immediately on enrollment

  • Prerequisites
    Python, SQL, Statistics, Machine Learning
In collaboration with
  • Appen

What You Will Learn

Syllabus

Data Engineering

For many companies, data scientists who can also tackle data-engineering problems are worth their weight in gold. In this course, you’ll learn how to unlock data silos, pulling data from multiple sources and pipelining it into usable forms for analysts and top-level decision makers. At the end, you’ll even build an impressive machine-learning-powered web application that has real-world, life-saving significance.

In this course, you’ll learn how to unlock data silos, pulling data from multiple sources and pipelining it into usable forms for analysts and top-level decision makers. At the end, you’ll even build an impressive machine-learning-powered web application that has real-world, life-saving significance.

Prerequisite Knowledge

Python, SQL, Statistics, Machine Learning.

  • ETL Pipelines

    Understand what ETL pipelines are and cccess and combine data from CSV, JSON, logs, APIs and databases.

  • Natural Language Processing

    Prepare text data for analysis with tokenization, lemmatization, and removing stop words. Use scikit-learn to transform and vectorize text data and build features with bag of words and tf-idf.

  • Machine Learning Pipelines

    Understand the advantages of using machine learning pipelines to streamline the data preparation and modeling process. Use feature unions to perform steps in parallel and create more complex workflows and complete a case study to build a full machine learning pipeline that prepares data and creates a model for a dataset.

  • Course Project: Build Disaster Response Pipelines

    In this project, you’ll build a data pipeline to prepare the message data from major natural disasters around the world. You’ll build a machine learning pipeline to categorize emergency text messages based on the need communicated by the sender.

Icon - Dark upwards trend arrow

Data Scientist roles are growing by 45% year over year!

woman-leading-group-meeting

Introducing new Udacity Single Courses

Our students asked and we listened. You can now get the in-demand tech skills you need faster and for less money by enrolling in one of our new, one-month Single Courses. You’ll get the specific job-ready skills you need in as little as four weeks and for a fraction of the cost.

Of course if you are looking for a more robust, in-depth education, you can still enroll in one of our 3-6 month Nanodegree programs.

Both programs are part-time and online, and they both offer 24/7 support, quality Udacity-produced content, courses created with the help of top tech companies, and more. You can always start with a Single Course and upgrade to a full Nanodegree program if you like.

All Our Courses Include

Real-world projects from industry experts

Real-world projects from industry experts

With real world projects and immersive content built in partnership with top tier companies, you’ll master the tech skills companies want.
Technical mentor support

Technical mentor support

Our knowledgeable mentors guide your learning and are focused on answering your questions, motivating you and keeping you on track.
Workspaces

Workspaces to see your code in action

Validate your understanding of concepts learned by checking the output and quality of your code in real-time.
Flexible learning program

Flexible learning program

Tailor a learning plan that fits your busy life. Learn at your own pace and reach your personal goals on the schedule that works best for you.
Course OfferingsFull list of offerings included:
Enrollment Includes:
Class Content
Real-world projects
icon-checkmarkCheckmark
Project reviews
icon-checkmarkCheckmark
Project feedback from experienced reviewers
icon-checkmarkCheckmark
Student Services
Technical mentor support
New
icon-checkmarkCheckmark
Student community
Improved
icon-checkmarkCheckmark
Succeed with Personalized Services
We provide services customized for your needs at every step of your learning journey to ensure your success!
Get timely feedback on your projects
Reviews By the numbers
1,400+ project reviewers
2.7M projects reviewed
88/100 reviewer rating
1.1 hours avg project review turnaround time
Reviewer Services
  • Personalized feedback
  • Unlimited submissions and feedback loops
  • Practical tips and industry best practices
  • Additional suggested resources to improve
Mentors available to answer your questions
Mentors by the numbers
1,400+ technical mentors
0.85 hours median response time
Mentorship Services
  • Support for all your technical questions
  • Questions answered quickly by our team of technical mentors

Data Engineering

Get Started Today

  • Monthly Access

    Pay as you go


    per

    /

    /

    Enroll Now
    • Maximum flexibility to learn at your own pace.
    • Cancel anytime.
  • Learn

    How to pull data, store it, and build ETL, NLP and machine-learning data pipelines with Python.
  • Average Time

    On average, successful students take 28 hours to complete this program.
  • Benefits include

    • Real-world projects from industry experts
    • Technical mentor support

Program Details

  • Do I need to apply? What are the admission criteria?
    No. This Course accepts all applicants regardless of experience and specific background.
  • What are the prerequisites for enrollment?
    Machine Learning:
    • Supervised and Unsupervised methods equivalent to those taught in the Intro to Machine Learning Nanodegree Program.
    Python:
    • Experience with Python Programming including writing functions, building basic applications, and common libraries like NumPy and pandas
    • SQL programming including querying databases, using joins, aggregations, and subqueries
    • Comfortable using the Terminal and Github
    Probability and Statistics:
    • Descriptive Statistics including calculating measures of center and spread
    • Inferential Statistics including sampling distributions, hypothesis testing
  • How is this course structured?
    The Data Engineering course is comprised of content and curriculum to support one project. We estimate that students can complete the program in 28 hours.

    The project will be reviewed by the Udacity reviewer network and platform. Feedback will be provided and if you do not pass the project, you will be asked to resubmit the project until it passes.
  • How long is this course?
    Access to this course runs for the length of time specified in the payment card above. If you do not graduate within that time period, you will continue learning with month to month payments. See the Terms of Use and FAQs for other policies regarding the terms of access to our programs.
  • Can I switch my start date? Can I get a refund?
    Please see the Udacity Program Terms of Use and FAQs for policies on enrollment in our programs.
  • What software and versions will I need in this course?
    You’ll need access to the Internet, and a 64 bit computer. Additional software: need to be able to download and run Python 3.7.

Data Engineering

Enroll Now