SKILL BENCHMARK
SRE Proficiency (Advanced Level)
- 32m
- 32 questions
The SRE Proficiency benchmark measures whether a learner has had extensive exposure to SRE technologies, practices, and principles across multiple platforms. A learner who scores high on this benchmark demonstrates professional proficiency in all of the major areas of SRE operations, across a variety of different platforms and deployments.
Topics covered
- define the concept of criticality, name four criticality values, and identify the purpose of criticality and each value
- define the mean time between failures (MTBF) metric and outline when and how to use it for SRE work
- define the mean time to resolve (MTTR) metric and outline when and how to use it for SRE work
- define the mean time to respond (MTTR) metric and describe why it might be used in SRE
- define what is meant by cascading failures and identify situations in which this term is used
- define what is meant by operational loads, list their types, and describe how they relate to optimal performance
- define what is meant by resource exhaustion and describe its consequences
- describe how automation processes can vary
- describe how server overloads can lead to cascading failures
- describe the features and benefits of the mean time to failure (MTTF) metric and outline how to use it in SRE work
- describe the purpose and characteristics of utilization signals
- determine which factors are the root cause of a problem
- differentiate between load shedding and graceful degradation
- differentiate between SRE and DevOps
- list CPU considerations as they relate to failures and overutilization
- list factors that can contribute to memory exhaustion
- list the core tenets of SRE
- list the potential consequences of overloads, including serious illness to staff
- outline how to prevent server overloads
- outline processes for working with overload errors
- outline steps to ensure efficient queue management
- outline steps to mitigate overloads
- provide an overview of common pitfalls associated with troubleshooting systems
- provide an overview of Service Level Agreements
- provide an overview of Service Level Objectives
- provide an overview of Site Reliability Engineering
- provide an overview of the primary goals of a post-mortem philosophy
- provide an overview Service Level Indicators
- recognize how file descriptors and threads can directly lead to failures
- recognize how resource exhaustion can lead to service unavailability
- recognize how resource exhaustion can travel from one resource to another
- recognize the nine principles of Site Reliability Engineering