Lesson 1
An Introduction to Your Nanodegree Program
Welcome! We're so glad you're here. Join us in learning a bit more about what to expect and ways to succeed.
Nanodegree Program
The goal of the Site Reliability Engineer (SRE) Nanodegree program is to equip software developers with the engineering and operational skills required to build automation tools and responses that ensure designed solutions respond to non-functional requirements such as availability, performance, security, and maintainability. The content will focus on both designing systems to automate response to issues with software sites as well as how to respond to common on-call situations.
The goal of the Site Reliability Engineer (SRE) Nanodegree program is to equip software developers with the engineering and operational skills required to build automation tools and responses that ensure designed solutions respond to non-functional requirements such as availability, performance, security, and maintainability. The content will focus on both designing systems to automate response to issues with software sites as well as how to respond to common on-call situations.
Intermediate
3 months
Real-world Projects
Completion Certificate
Last Updated July 19, 2024
Skills you'll learn:
Prerequisites:
Course 1 • 1 day
Welcome! We're so glad you're here. Join us in learning a bit more about what to expect in this program and ways to succeed.
Lesson 1
Welcome! We're so glad you're here. Join us in learning a bit more about what to expect and ways to succeed.
Lesson 2
You are starting a challenging but rewarding journey! Take 5 minutes to read how to get help with projects and content.
Course 2 • 3 weeks
In this course, we will learn about the founding concepts of Observability in terms of people and tools.
Lesson 1
This lesson will introduce you to the course, including what SRE is and why it matters.
Lesson 2
In this lesson, we will learn how to distinguish unique SRE roles and responsibilities within an enterprise.
Lesson 3
In this lesson, we will investigate enterprise workflows that can be improved with common SRE practices using cost-benefit analysis.
Lesson 4
In this lesson, we will learn how to define an optimal SRE team structure and work allocation given business needs.
Lesson 5
By the end of this lesson, you will have a fully-functional monitoring system that uses some of the most popular tools in the industry.
Lesson 6 • Project
In this project, you will apply the skills you have acquired in the Establish a Foundation in Observability course to configure a monitoring software stack.
Course 3 • 3 weeks
In this course, we will look at how SREs view availability and reliability for their infrastructure. We'll learn how to create effective monitoring using SLOs and SLIs. We will create dashboards in Grafana. Next, we'll identify all our IT assets, ensure they are configured for high availability. And then we will craft a disaster recovery plan to make sure failover is seamless and automated. After that, we'll deploy the infrastructure to AWS using Terraform. We'll learn the benefits of infrastructure as code. We'll see how easy it is to deploy to multiple regions. Finally, we'll learn how to make databases highly available and disaster recovery ready. We'll look at recovery strategies and implement them in AWS via Terraform.
Lesson 1
Introduction to the course. We will look at how the topics all tie into being an SRE and what skills we'll learn and apply.
Lesson 2
In this lesson, we will learn about how SREs monitor using SLOs and SLIs. We will create queries in Prometheus and dashboard in Grafana.
Lesson 3
In this lesson, we will identify all IT assets, make those assets highly available, and put together a disaster recovery plan for those assets.
Lesson 4
In this lesson, we will deploy our HA/DR infrastructure using Terraform to AWS.
Lesson 5
In this lesson, we'll learn about database reliability and availability and how we can make databases more available. We will then deploy a replicated database cluster to AWS and also see a failover.
Lesson 6 • Project
In this project, you will apply the skills you've learned in this course, by defining and implementing a resilient infrastructure in a cloud platform.
Course 4 • 2 weeks
Self-healing architecture is resilient enough to withstand failure and resolve issues without human intervention through automation. In this course, you'll gain skills in self-healing architecture design strategies, deployment strategies, and cloud automation
Lesson 1
Welcome to Self-healing Architectures! In this lesson, you'll learn more about the course and the topic.
Lesson 2
In this lesson, you'll learn about self-healing system design fundamentals like single points of failure, tiered architecture, automation strategies, and microservice design.
Lesson 3
In this lesson, you'll learn about and implement several self-healing deployment strategies
Lesson 4
In this lesson, you'll learn about several different self-healing cloud automation configurations for microservices and virtual machines.
Lesson 5 • Project
In this project, you'll put everything you learned in the course into practice by playing the role of an SRE fixing and deploying applications using self-healing strategies
Site Reliability Engineer
Travis has been working in IT for over 10 years. He's also been adjunct teaching for over 5 years. He loves technology and sharing his knowledge with students. Travis brings his industry experience as an SRE to the table in teaching different classes. He blends industry expertise with step by step teaching to allow students to excel! Seeing students succeed is what he likes best.
CTO of Mechanicode.io
Emmanuel is co-founder of the Black Code Collective and DC's Technical.ly RealLIST Engineer award recipient. An AWS Certified DevSecOps specialist with 12 years of experience, he has spent his career developing innovative solutions using DevSecOps & Site reliability best practices.
Site Reliability Engineer
Sonny is an SRE with a varied background. He has dabbled in research at Lawrence Berkeley National Labs before moving into site reliability engineering to have a more hands on role. He has been published in several computing journals, as well as taught introductory programming courses.
Global Cloud Architect
Nathan is a Certified Six Sigma Black Belt and has 10+ years of experience in IT in multiple industries. He is also the Instructor for two other Udacity courses: Ensuring Quality Releases and Azure Performance.
Average Rating: 4.5 Stars
9 Reviews
David A.
June 14, 2022
Pretty fine, very demanding.
Artiom D.
May 31, 2022
Great program
Marius T.
April 29, 2022
good so far
Yi J.
March 8, 2022
The project design is decent, but course instructions can be improved. more explanation of the architecture of the project will help to understand the how the application works. There are some mistakes in the instruction as well, making the course completion very confusing.
Felipe F.
March 2, 2022
The program is more challenging than I expect, however, I'm really enjoying the program.
Combine technology training for employees with industry experts, mentors, and projects, for critical thinking that pushes innovation. Our proven upskilling system goes after success—relentlessly.
Demonstrate proficiency with practical projects
Projects are based on real-world scenarios and challenges, allowing you to apply the skills you learn to practical situations, while giving you real hands-on experience.
Gain proven experience
Retain knowledge longer
Apply new skills immediately
Top-tier services to ensure learner success
Reviewers provide timely and constructive feedback on your project submissions, highlighting areas of improvement and offering practical tips to enhance your work.
Get help from subject matter experts
Learn industry best practices
Gain valuable insights and improve your skills
Unlimited access to our top-rated courses
Real-world projects
Personalized project reviews
Program certificates
Proven career outcomes
Full Catalog Access
One subscription opens up this course and our entire catalog of projects and skills.
Average time to complete a Nanodegree program
4 weeks
, Intermediate
1 week
, Fluency
(47)
3 months
, Intermediate
(449)
3 months
, Intermediate
(34)
2 months
, Intermediate
(87)
3 months
, Advanced
(115)
3 months
, Beginner
(9)
3 months
, Intermediate
(498)
2 months
, Intermediate
3 weeks
, Intermediate
(416)
2 months
, Intermediate
(398)
3 months
, Intermediate
1 month
, Beginner
(62)
2 months
, Intermediate
(672)
2 months
, Beginner
2 months
, Beginner