SRE Team Management: Managing Operational Loads
SRE
| Intermediate
- 17 videos | 54m 39s
- Includes Assessment
- Earns a Badge
To ensure and maintain a system's functional state, site reliability engineers (SRE) must learn how to identify, calculate, and manage a system's operational load, which generally falls into three categories: ongoing operation activities, tickets, and pages. In this course, you'll explore these categories in detail. You'll start by outlining methods for managing operational loads at the team level and using support ticketing systems and service level objectives. Next, you'll investigate 'toil,' a term used to describe the operational work associated with running and maintaining a production service. You'll outline steps for identifying, calculating, and eliminating toil and examine the adverse effects toil can have on a team. Additionally, you'll outline how to work with interrupts and distinguish between crucial metrics used for managing them. Lastly, you'll identify the human element factors to consider when dealing with interrupts, including efficiency, distractibility, and respect.
WHAT YOU WILL LEARN
-
Discover the key concepts covered in this courseDescribe what is meant by operational load and outline the three general categories of operational loadOutline how on-call engineers depend on pages to respond to incidents and outagesOutline the steps involved in responding to emergency incidentsOutline the purpose of customer request support tickets and provide examples of simple and complex ticketsDescribe the essential components of a typical ticketing systemRecognize how to use service level objectives (slo) to ensure timely responses and resolutionsDescribe what is meant by toil and provide examples of toil, such as applying schema changes to a databaseDifferentiate between types of toil including automated, manual, repetitive, and tactical
-
Outline steps to track and identify toil and describe why less toil is betterDescribe how to measure and calculate toilOutline steps to minimize or eliminate toil completelyDifferentiate between toil and complexity and describe approaches to address complexityDescribe how toil can negatively effect staff including through low morale and confusion amongst sresList key metrics used for managing interrupts, such as the severity of the interruptOutline human element factors to consider when dealing with interrupts, such as distractibilitySummarize the key concepts covered in this course
IN THIS COURSE
-
1m 44s
-
3m 35sAfter completing this video, you will be able to describe what is meant by operational load and outline the three general categories of operational load. FREE ACCESS
-
2m 53sIn this video, you will outline how on-call engineers depend on being paged to respond to incidents and outages. FREE ACCESS
-
3m 29sIn this video, learn how to outline the steps involved in responding to emergency incidents. FREE ACCESS
-
3m 29sLearn how to outline the purpose of customer request support tickets and provide examples of simple and complex tickets. FREE ACCESS
-
4m 36sAfter completing this video, you will be able to describe the essential components of a typical ticketing system. FREE ACCESS
-
3m 25sAfter completing this video, you will be able to recognize how to use service level objectives (SLOs) to ensure timely responses and resolutions. FREE ACCESS
-
3m 8sUpon completion of this video, you will be able to describe what is meant by toil and provide examples of toil. FREE ACCESS
-
3m 41sIn this video, you will differentiate between types of labor including automated, manual, repetitive, and tactical. FREE ACCESS
-
3m 7sFind out how to outline steps to track and identify toil and describe why having less toil is better. FREE ACCESS
-
3m 15sAfter completing this video, you will be able to describe how to measure and calculate work. FREE ACCESS
-
3m 21sFind out how to outline steps to minimize or eliminate toil. FREE ACCESS
-
2m 52sIn this video, learn how to differentiate between work and complexity and describe approaches to address complexity. FREE ACCESS
-
3m 18sUpon completion of this video, you will be able to describe how toil can negatively effect staff including through low morale and confusion amongst SREs. FREE ACCESS
-
3m 29sUpon completion of this video, you will be able to list key metrics used for managing interrupts, such as the severity of the interrupt. FREE ACCESS
-
4m 4sIn this video, learn how to outline human element factors to consider when dealing with distractions, such as distractibility. FREE ACCESS
-
1m 13s
EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE
Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.
Digital badges are yours to keep, forever.