Skip to content

Site Reliability Engineer

Nanodegree Program

Master the job-ready skills you need to be a successful site reliability engineer and start designing systems to automate responses to software site issues.

Enroll Now

01Days06Hrs50Min56Sec

  • Estimated time
    4 Months

    At 5-10 hours/week

  • Enroll by
    March 22, 2023

    Get access to the classroom immediately upon enrollment

  • Prerequisites
    Python or Java, Bash or Powershell, Linux, UNIX Shell and SQL

What you will learn

  1. Site Reliability Engineer

    Estimated 4 months to complete

    Master the skills necessary to become a successful site reliability engineer. Learn to build automation tools that ensure designed solutions respond to requirements such as availability, performance, security, and maintainability.

    Prerequisite knowledge

    Python or Java, Bash or Powershell, Linux, UNIX Shell and SQL.

    1. Foundations of Observability

      Get a practical introduction to what observability requires in terms of people and tools. Learn about site reliability engineering, its roles and responsibilities, and how those differ from other teams. See how the role helps an enterprise improve, discuss associated costs, learn the types of members and about the tools a team may use.

    2. Planning for High Availability and Incident Response

      This course will cover monitoring, high availability (HA) and disaster recovery (DR), infrastructure as code, and database recovery and availability. Learn the basics about SLOs and SLIs as well as how to translate them into queries and finally graphs. Also, learn how to design and deploy highly available databases to AWS.

    3. Self-Healing Architecture

      Learn how to deploy microservices or cloud architecture that is resilient enough to withstand failures, and predictable enough to resolve issues via automation without human intervention. Understand self-healing system design fundamentals, deployment strategies, implementation steps, and use cases. Learn cloud automation to increase the resiliency of systems.

    4. Establishing a Culture of Reliability

      Learn how to develop processes and frameworks that drive workplaces toward putting reliability first by working through the incident management process and how to have effective on-calls. Understand how to perform reliability reviews on various phases of your system, how to effectively manage system capacity, and how to reduce toil.

All our programs include

  • Real-world projects from industry experts

    With real-world projects and immersive content built in partnership with top-tier companies, you’ll master the tech skills companies want.

  • Technical mentor support

    Our knowledgeable mentors guide your learning and are focused on answering your questions, motivating you, and keeping you on track.

  • Career services

    You’ll have access to Github portfolio review and LinkedIn profile optimization to help you advance your career and land a high-paying role.

  • Flexible learning program

    Tailor a learning plan that fits your busy life. Learn at your own pace and reach your personal goals on the schedule that works best for you.

Program offerings

  • Class content

    • Real-world projects
    • Project reviews
    • Project feedback from experienced reviewers
  • Student services

    • Technical mentor support
    • Student community
  • Career services

    • Github review
    • LinkedIn profile optimization

Succeed with personalized services.

We provide services customized for your needs at every step of your learning journey to ensure your success.

Get timely feedback on your projects.

  • Personalized feedback
  • Unlimited submissions and feedback loops
  • Practical tips and industry best practices
  • Additional suggested resources to improve
  • 1,400+

    project reviewers

  • 2.7M

    projects reviewed

  • 88/100

    reviewer rating

  • 1.1 hours

    avg project review turnaround time

Learn with the best.

Learn with the best.

  • Nathan Anderson, MBA

    Global Cloud Architect

    Nathan is a Certified Six Sigma Black Belt and has 10+ years of experience in IT in multiple industries. He is also the Instructor for two other Udacity courses: Ensuring Quality Releases and Azure Performance.

  • Travis Scotto

    Site Reliability Engineer

    Travis Scotto has worked in technology for 10 years. He has worked in various infrastructure roles: virtualization, databases, and monitoring. As an SRE, he employs automation and monitoring daily. He also has adjunct taught IT classes for 4.5 years.

  • Emmanuel Apau

    CTO of Mechanicode.io

    Emmanuel is co-founder of the Black Code Collective and DC's Technical.ly RealLIST Engineer award recipient. An AWS Certified DevSecOps specialist with 12 years of experience, he has spent his career developing innovative solutions using DevSecOps & Site reliability best practices.

  • Sonny Sevin

    Site Reliability Engineer

    Sonny is an SRE with a varied background. He has dabbled in research at Lawrence Berkeley National Labs before moving into site reliability engineering to have a more hands on role. He has been published in several computing journals, as well as taught introductory programming courses.

Site Reliability Engineer

Get started today

    • Learn

      How to create and implement technical solutions by utilizing site reliability engineering principles.

    • Average Time

      On average, successful students take 4 months to complete this program.

    • Benefits include

      • Real-world projects from industry experts
      • Technical mentor support
      • Career services

    Program details

    Program overview: Why should I take this program?
    • Why should I enroll?

      This program is designed to help you take advantage of the growing need for skilled site reliability engineers. Prepare to meet the demand for qualified site reliability engineers that can respond to real-life, high-stakes workplace challenges.

    • What jobs will this program prepare me for?

      The skills you will gain from this Nanodegree program will qualify you for jobs in several industries as countless companies are trying to incorporate better site reliability practices into their organizations.

    • How do I know if this program is right for me?

      The program is for individuals who are looking to advance their site reliability engineering careers with skills in a burgeoning field.

    Enrollment and admission
    • Do I need to apply? What are the admission criteria?

      No. This Nanodegree program accepts all applicants regardless of experience and specific background.

    • What are the prerequisites for enrollment?

      A well-prepared learner is already able to:

      • Write basic functions in an object-oriented language (Python or Java), such as for loops, conditionals, Control Flow; Python Methods; Java Methods, etc.
      • Write basic shell scripts in Bash or Powershell, which could include for loops, conditionals, scripting, etc.
      • Work with Linux command-line (bash/shell) and UNIX Shell
      • Create simple SQL queries using SELECT, JOINS, GROUP BY functions.
      • Display networking skills including knowledge of virtual networks, DNS, subnets, and basic network troubleshooting techniques.
      • Perform DevOps tasks, such as setting up monitoring, doing feature rollout, troubleshooting production issues, ideally for large systems.
      • Work with Kubernetes and basic kubectl, such as kubectl apply, kubectl create, kubectl config.
    • If I do not meet the requirements to enroll, what should I do?

      Students who do not feel comfortable in the above may consider taking any of the web development Nanodegrees (Cloud Developer, Cloud Developer using Microsoft Azure, or Full Stack Web Developer).

    Tuition and term of program
    • How is this Nanodegree program structured?

      The Site Reliability Nanodegree program consists of content and curriculum to support 4 projects. We estimate that students can complete the program in 4 months working 5-10 hours per week.

      Each project will be reviewed by the Udacity reviewer network. Feedback will be provided and if you do not pass the project, you will be asked to resubmit the project until it passes.

    • How long is this Nanodegree program?

      Access to this Nanodegree program runs for the length of time specified above. If you do not graduate within that time period, you will continue learning with month-to-month payments. See the Terms of Use and FAQs for other policies regarding the terms of access to our Nanodegree programs.

    • Can I switch my start date? Can I get a refund?

      Please see the Udacity Program FAQs for policies on enrollment in our programs.

    Software and hardware: What do I need for this program?
    • What software and versions will I need in this program?

      There are no software and version requirements to complete this Nanodegree program. All coursework and projects can be completed in the Udacity online classroom. Udacity’s basic tech requirements can be found here.

    Site Reliability Engineer

    Enroll Now