Udacity Logo
Log InJoin for Free

Establishing a Culture of Reliability

Course

This course is all about how to foster a culture that is based on reliability. We will learn how to utilize best practices for several key areas of being a Site Reliability Engineer (SRE) and how they contribute to a culture of reliability. We will cover how to have balanced and effective on-call rotations as well as how to handle incidents. Next, we will discuss how to review your system throughout its lifecycle to find and mitigate any potential risk factors. Managing system capacity at all phases of a system's lifecycle is another major component to ensuring that everything is operating at maximum reliability. We will round out this course by discussing a thorn in every SRE's side: toil. We will discuss how to identify and reduce toil to maximize time spent performing operational work.

This course is all about how to foster a culture that is based on reliability. We will learn how to utilize best practices for several key areas of being a Site Reliability Engineer (SRE) and how they contribute to a culture of reliability. We will cover how to have balanced and effective on-call rotations as well as how to handle incidents. Next, we will discuss how to review your system throughout its lifecycle to find and mitigate any potential risk factors. Managing system capacity at all phases of a system's lifecycle is another major component to ensuring that everything is operating at maximum reliability. We will round out this course by discussing a thorn in every SRE's side: toil. We will discuss how to identify and reduce toil to maximize time spent performing operational work.

Intermediate

4 weeks

Real-world Projects

Completion Certificate

Last Updated May 15, 2023

Prerequisites:

No experience required

Course Lessons

Lesson 1

Introduction to Establishing a Culture of Reliability

In this lesson, we cover some introductory material to help you start with a solid foundation.

Lesson 2

Improving On-Call Effectiveness

Having a solid on-call is very important to achieving peak reliability. This lesson discusses how to have balanced on-call shifts with a solid incident management process that your team can follow.

Lesson 3

Reliability Reviews

In this lesson, we learn how to review your system from the start to prepare for a release. It is important that you have systems in place to find potential risks and develop mitigations for them.

Lesson 4

Managing System Capacity

System capacity is an essential part of ensuring reliability. This lesson discusses how to balance system capacity with costs to ensure that resources and money are not being wasted.

Lesson 5

Toil Reduction

Toil is the bane of every SRE team, and this lesson is all about how to reduce toil to allow your team to focus on operational work that improves reliability.

Lesson 6 • Project

Project: Plan, Reduce, Repeat

To wrap everything up, you will complete the final project, where you will be participating in three scenarios that will tie everything you have learned together.

Taught By The Best

Photo of Sonny Sevin

Sonny Sevin

Site Reliability Engineer

Sonny is an SRE with a varied background. He has dabbled in research at Lawrence Berkeley National Labs before moving into site reliability engineering to have a more hands on role. He has been published in several computing journals, as well as taught introductory programming courses.

The Udacity Difference

Combine technology training for employees with industry experts, mentors, and projects, for critical thinking that pushes innovation. Our proven upskilling system goes after success—relentlessly.

Demonstrate proficiency with practical projects

Projects are based on real-world scenarios and challenges, allowing you to apply the skills you learn to practical situations, while giving you real hands-on experience.

  • Gain proven experience

  • Retain knowledge longer

  • Apply new skills immediately

Top-tier services to ensure learner success

Reviewers provide timely and constructive feedback on your project submissions, highlighting areas of improvement and offering practical tips to enhance your work.

  • Get help from subject matter experts

  • Learn industry best practices

  • Gain valuable insights and improve your skills

Unlock access to Establishing a Culture of Reliability and the rest of our best-in-class catalog

  • Unlimited access to our top-rated courses

  • Real-world projects

  • Personalized project reviews

  • Program certificates

  • Proven career outcomes

Full Catalog Access

One subscription opens up this course and our entire catalog of projects and skills.

Month-To-Month

4 Months

Average time to complete a Nanodegree program

*Discount applies to the first 4 months of membership, after which plans are converted to month-to-month.

Get Started Today

Establishing a Culture of Reliability

Month-To-Month


  • Unlimited access to our top-rated courses
  • Real-world projects
  • Personalized project reviews
  • Program certificates
  • Proven career outcomes

4 Months

Average time to complete a Nanodegree program

  • All the same great benefits in our month-to-month plan
  • Most cost-effective way to acquire a new set of skills
Discount applies to the first 4 months of membership, after which plans are converted to month-to-month.