Introduction to SRE and Essential Tools

SRE    |    Intermediate
  • 16 videos | 1h 34m 42s
  • Includes Assessment
  • Earns a Badge
Site reliability engineering (SRE) is based on a set of principles and practices used to monitor and observe software reliability in a production environment. In this course, you will dive into the fundamentals of SRE and the evolution of SRE over the years. Next, you will examine the site reliability engineering role and find out how to suitably find, place, bootstrap, and distribute site reliability engineers. You will discover the SRE principles that organizations should strive for, key SRE metrics, the importance of error budgeting, and the essential tools used in SRE. Then you will compare and contrast SRE to traditional IT operations, explore the SRE lifecycle from planning to operation, and investigate the process of incident response and postmortem analysis. Finally, you will focus on the cultural impacts of SRE within an organization, set up and configure a basic monitoring tool, and create a simple dashboard using Grafana.

WHAT YOU WILL LEARN

  • Discover the key concepts covered in this course
    Define the role and responsibilities of a site reliability engineer
    Provide an overview of the history and evolution of sre
    Describe how to find, place, bootstrap, and distribute site reliability engineers
    Outline the site reliability engineering principles that organizations should strive for
    Identify key sre metrics, such as service-level objective (slo), service-level agreement (sla), and service-level indicator (sli)
    Describe the importance of error budgets in sre practice
    List essential tools used in sre
  • Compare and contrast sre to traditional it operations
    Summarize the steps of the sre lifecycle from planning to operation
    Outline the process of incident response and postmortem analysis
    Illustrate how automation is utilized in sre to maintain system reliability
    Provide an overview of the cultural impact of sre within an organization
    Set up and configure a basic monitoring tool
    Create a simple dashboard to visualize key metrics using grafana
    Summarize the key concepts covered in this course

IN THIS COURSE

  • 1m 5s
    In this video, we will discover the key concepts covered in this course. FREE ACCESS
  • 6m 34s
    After completing this video, you will be able to define the role and responsibilities of a site reliability engineer. FREE ACCESS
  • Locked
    3.  The Evolution of Site Reliability Engineering (SRE)
    5m 48s
    Upon completion of this video, you will be able to provide an overview of the history and evolution of SRE. FREE ACCESS
  • Locked
    4.  Site Reliability Engineering Role
    6m 23s
    After completing this video, you will be able to describe how to find, place, bootstrap, and distribute site reliability engineers. FREE ACCESS
  • Locked
    5.  Site Reliability Engineering Principles
    7m 33s
    Upon completion of this video, you will be able to outline the site reliability engineering principles that organizations should strive for. FREE ACCESS
  • Locked
    6.  Key Site Reliability Engineering Metrics
    6m 27s
    After completing this video, you will be able to identify key SRE metrics, such as service-level objective (SLO), service-level agreement (SLA), and service-level indicator (SLI). FREE ACCESS
  • Locked
    7.  Error Budgeting
    6m 57s
    Upon completion of this video, you will be able to describe the importance of error budgets in SRE practice. FREE ACCESS
  • Locked
    8.  Essential Site Reliability Engineering Tools
    7m 27s
    After completing this video, you will be able to list essential tools used in SRE. FREE ACCESS
  • Locked
    9.  SRE vs. IT Tools
    6m 4s
    Upon completion of this video, you will be able to compare and contrast SRE to traditional IT operations. FREE ACCESS
  • Locked
    10.  Site Reliability Engineering Lifecycle
    5m 34s
    After completing this video, you will be able to summarize the steps of the SRE lifecycle from planning to operation. FREE ACCESS
  • Locked
    11.  Incident Response and Postmortem Analysis
    6m 31s
    Upon completion of this video, you will be able to outline the process of incident response and postmortem analysis. FREE ACCESS
  • Locked
    12.  Automation and System Reliability
    6m 36s
    After completing this video, you will be able to illustrate how automation is utilized in SRE to maintain system reliability. FREE ACCESS
  • Locked
    13.  Cultural Impacts of SRE
    6m
    Upon completion of this video, you will be able to provide an overview of the cultural impact of SRE within an organization. FREE ACCESS
  • Locked
    14.  Using Monitoring Tools
    7m 3s
    Find out how to set up and configure a basic monitoring tool. FREE ACCESS
  • Locked
    15.  Using Dashboards in Grafana
    7m 49s
    In this video, you will learn how to create a simple dashboard to visualize key metrics using Grafana. FREE ACCESS
  • Locked
    16.  Course Summary
    52s
    In this video, we will summarize the key concepts covered in this course. FREE ACCESS

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.

Digital badges are yours to keep, forever.