Real-world projects from industry experts
With real-world projects and immersive content built in partnership with top-tier companies, you’ll master the tech skills companies want.
Learn more about the big data ecosystem and how to use Spark to work with massive datasets.
Get access to classroom immediately on enrollment
Build a data lake on AWS and a data catalog following the principles of data lakehouse architecture. Learn about the big data ecosystem and the power of Apache Spark for data wrangling and transformation. Work with AWS data tools and services to extract, load, process, query, and transform semi-structured data in data lakes.
Intermediate Python, Intermediate SQL
Identify what constitutes the big data ecosystem for data engineering. Explain the purpose and evolution of data lakes in the big data ecosystem. Compare the Spark framework with Hadoop framework. Identify when to use Spark and when not to use it and describe the features of lakehouse architecture.
Wrangle data with Spark and functional programming to scale across distributed systems. Process data with Spark DataFrames and Spark SQL. Process data in common formats such as CSV and JSON. Use the Spark RDDs API to wrangle data and transform and filter data with Spark.
Use distributed data storage with Amazon S3 and identify properties of AWS S3 data lakes. Identify service options for using Spark in AWS and configure AWS Glue. Create and run Spark Jobs with AWS Glue.
Use Spark with AWS Glue to run ELT processes on data of diverse sources, structures, and vintages in lakehouse architecture. Create a Glue Data Catalog and Glue Tables. Use AWS Athena for ad-hoc queries in a lakehouse. Leverage Glue for SQL AWS S3 queries and ELT. Ingest data into lakehouse zones. Transform and filter data into curated lakehouse zones with Spark and AWS Glue. Join and process data into lakehouse zones with Spark and AWS Glue.
Act as a data engineer for the STEDI team to build a data lakehouse solution for sensor data that trains a machine learning model. Build an ELT (Extract, Load, Transform) pipeline for lakehouse architecture, load data from an AWS S3 data lake, process the data into analytics tables using Spark and AWS Glue, and load them back into lakehouse architecture.
With real-world projects and immersive content built in partnership with top-tier companies, you’ll master the tech skills companies want.
On demand help. Receive instant help with your learning directly in the classroom. Stay on track and get unstuck.
Validate your understanding of concepts learned by checking the output and quality of your code in real-time.
Tailor a learning plan that fits your busy life. Learn at your own pace and reach your personal goals on the schedule that works best for you.
We provide services customized for your needs at every step of your learning journey to ensure your success.
project reviewers
projects reviewed
reviewer rating
avg project review turnaround time
Sean currently teaches cybersecurity and DevOps courses at Brigham Young University Idaho. He has been a software engineer for over 16 years. Some of the most exciting projects he has worked on involved data pipelines for DNA processing and vehicle telematics.
Learn about the big data ecosystem and the power of Apache Spark for data wrangling and transformation.
On average, successful students take 1 month to complete this program.
No. This Course accepts all applicants regardless of experience and specific background.
A well-prepared learner has experience in Relational database design, SQL, Basic dimensional modeling, Data Modeling Basics, Amazon Web Services Basics, and Python.
This course is comprised of content and curriculum to support one project. We estimate that students can complete the program in one month.
The project will be reviewed by the Udacity reviewer network and platform. Feedback will be provided and if you do not pass the project, you will be asked to resubmit the project until it passes.
Access to this course runs for the length of time specified in the payment card above. If you do not graduate within that time period, you will continue learning with month to month payments. See the Terms of Use and FAQs for other policies regarding the terms of access to our programs.
Please see the Udacity Program Terms of Use and FAQs for policies on enrollment in our programs.
There are no software and version requirements to complete this course. All coursework and projects can be completed via Student Workspaces in the Udacity online classroom. Udacity’s full technical requirements are listed here.