SRE Incident Management: Deep Dives, Postmortems, & Continuous Improvement

SRE    |    Intermediate
  • 12 videos | 1h 42m 8s
  • Includes Assessment
  • Earns a Badge
Rating 4.8 of 8 users Rating 4.8 of 8 users (8)
Site reliability engineering (SRE) incident management focuses on managing and responding to incidents effectively, including implementing best practices for incident response, postmortems, and continuous improvement processes. In this course, explore advanced techniques for incident analysis and root cause identification, including best practices for conducting effective and blameless postmortems. Next, discover methods for translating postmortem findings into actionable improvements and how to implement strategies for fostering a culture of transparency and continuous learning. Finally, learn about approaches for measuring and tracking the effectiveness of improvements. After completing this course, you will be able to implement advanced incident analysis and root cause identification methods.

WHAT YOU WILL LEARN

  • Discover the key concepts covered in this course
    Identify how to conduct deep dive analyses to uncover the root causes of incidents
    Outline how to design and facilitate blameless postmortem meetings and translate postmortem outcomes into clear, actionable items
    Recognize how to facilitate a blameless postmortem meeting following a simulated incident
    Describe how to implement continuous improvement mechanisms within incident management processes
    Outline how to develop key metrics and kpis to measure incident management effectiveness
  • Recognize how to utilize psychological safety techniques to encourage open communication
    Identify how to integrate incident management insights with broader organizational learning
    Outline how to enhance tooling and automation based on incident learnings
    List strategies for sharing incident learnings and best practices across the organization
    Recognize how to evaluate and refine incident response strategies over time
    Summarize the key concepts covered in this course

IN THIS COURSE

  • 41s
    In this video, we will discover the key concepts covered in this course. FREE ACCESS
  • 9m 31s
    After completing this video, you will be able to identify how to conduct deep dive analyses to uncover the root causes of incidents. FREE ACCESS
  • Locked
    3.  SRE Incident Response and Postmortem Analysis
    10m 40s
    Upon completion of this video, you will be able to outline how to design and facilitate blameless postmortem meetings and translate postmortem outcomes into clear, actionable items. FREE ACCESS
  • Locked
    4.  Postmortem Meeting Facilitation
    10m 20s
    Through this video, you will be able to recognize how to facilitate a blameless postmortem meeting following a simulated incident. FREE ACCESS
  • Locked
    5.  SRE Incident Continuous Improvement
    10m 38s
    In this video, we will describe how to implement continuous improvement mechanisms within incident management processes. FREE ACCESS
  • Locked
    6.  SRE Response Effectiveness Measurements
    12m 5s
    After completing this video, you will be able to outline how to develop key metrics and KPIs to measure incident management effectiveness. FREE ACCESS
  • Locked
    7.  SRE Psychological Safety and Communication
    7m 46s
    Through this video, you will be able to recognize how to utilize psychological safety techniques to encourage open communication. FREE ACCESS
  • Locked
    8.  SRE Incident Response Awareness Training
    7m 35s
    In this video, we will identify how to integrate incident management insights with broader organizational learning. FREE ACCESS
  • Locked
    9.  SRE Tool and Automation Enhancement
    12m 2s
    After completing this video, you will be able to outline how to enhance tooling and automation based on incident learnings. FREE ACCESS
  • Locked
    10.  SRE Incident Response Organizational Awareness
    10m 37s
    Upon completion of this video, you will be able to list strategies for sharing incident learnings and best practices across the organization. FREE ACCESS
  • Locked
    11.  SRE Incident Responsiveness Improvements
    9m 12s
    Through this video, you will be able to recognize how to evaluate and refine incident response strategies over time. FREE ACCESS
  • Locked
    12.  Course Summary
    59s
    In this video, we will summarize the key concepts covered in this course. FREE ACCESS

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.

Digital badges are yours to keep, forever.

YOU MIGHT ALSO LIKE

Rating 5.0 of 1 users Rating 5.0 of 1 users (1)
Rating 4.7 of 66 users Rating 4.7 of 66 users (66)
Rating 4.7 of 6 users Rating 4.7 of 6 users (6)