SRE Emergency & Incident Response: Responding to Emergencies
SRE
| Intermediate
- 18 videos | 1h 12m 46s
- Includes Assessment
- Earns a Badge
Site Reliability Engineers (SREs) are responsible for assigning the appropriate resources and responsibilities to effectively deal with unexpected emergencies. To do this, SREs should ensure the proper processes and teams are in place before an emergency occurs. In this course, you'll explore the different emergency types and outline how to plan for them. You'll examine the causes of and how to respond to test-induced, change-induced, and process-induced emergencies and what's involved in proactive approaches to emergency testing and planning. You'll then outline the critical steps to correctly documenting emergencies, including the history of outages and mistakes. You'll then differentiate between business continuity and disaster recovery planning and outline how to create both types of plans and conduct a business impact analysis. Lastly, you'll explore some IT recovery strategies.
WHAT YOU WILL LEARN
-
Discover the key concepts covered in this courseOutline the fundamental emergency response principles sres need to be familiar with and recognize the critical steps to take when a system breaksRecognize the benefits of performing test-induced emergencies and outline what this involvesName the causes and outcomes of change-induced emergencies and outline how to respond to these emergenciesDefine what is meant by a process-induced emergency, describe the effects of them, and outline how to respond to themDescribe why it is vital to keep a history of outages and mistakes and outline best practices when doing soRecognize the importance of asking important, relevant, and challenging questionsDefine what is meant by proactive testing, compare it to reactive testing, recognize the importance of encouraging proactive testing, and name best practices when carrying out this type of testingDefine what is meant by business continuity and describe why this type of planning matters
-
Outline the six steps involved in developing a business continuity planOutline methods to test a business continuity plan, recognize the importance of testing this type of plan, and describe some tips when testingRecognize the importance of ongoing efforts to review and improve a business continuity plan and outline how to go about doing itRecognize the importance of having 'top-level' support for business plans and promoting user awareness, and outline how to achieve these goalsDefine what is meant by a business impact analysis, outline how to conduct one and its typical structure, and name the possible effects on business operationsRecognize the importance of developing an it disaster recovery plan, list the goals of this type of plan, and describe what to consider when developing oneOutline key steps to creating a working disaster recovery planName some types of it recovery strategies and recognize the importance of recovery strategies developed for it systems, applications, and dataSummarize the key concepts covered in this course
IN THIS COURSE
-
1m 47s
-
5m 39sSite reliability engineers (SREs) need to be familiar with the fundamental emergency response principles in order to respond effectively to system failures. The video discusses post-mortem philosophy, triggers for a post-mortem, and steps an SRE should take when a system breaks. FREE ACCESS
-
3m 31sIn this video, you'll learn about three stages of Test-induced Emergencies. You'll learn how to induce an emergency and how to respond to it. You'll also learn what you can expect as outcomes from test-induced emergencies. Whether you induce an emergency in some staging environment or in production, it's a form of testing that affects others. FREE ACCESS
-
4m 24sIn this video, you'll learn more about Change-induced Emergencies. These are emergencies that are a direct result of internal change to the system, such as configuration pushes and code pushes. You'll learn the various causes of change-induced emergencies, identifying the various changes that go on in a complex environment and how they're managed. You'll also discover how to respond to an emergency in an efficient manner and look at the possible outcomes of change-induced emergencies. FREE ACCESS
-
3m 16sIn this video, you'll learn about Process-induced Emergencies. You'll examine what constitutes a process-induced emergency, discuss responding appropriately to a process-induced emergency, and look at appropriate outcomes of a process-induced emergency. On the screen is a diagram of a large organization with many services clustered within it. The services are grouped by category and color coded. The diagram includes a large clock labeled 1:00 PM. FREE ACCESS
-
3m 29sIn this video, you'll learn why it is vital to keep a history of outages and mistakes and outline best practices when doing so. You'll learn that as an SRE it's your responsibility to learn from incidents so you can avoid them in the future. This means documenting incidents is crucial. FREE ACCESS
-
4m 31sThis video is about how to ask questions effectively in order to gain insights and improve systems. Open-ended questions are better than closed-ended questions because they often invite more thought and are less predictable. Tough questions are especially useful in the technical arena because they force people to think deeply about their understanding of the subject. FREE ACCESS
-
5m 10sIn this video, you will learn about Proactive Testing. You will define what is meant by proactive testing, compare it to reactive testing, recognize the importance of encouraging proactive testing, and name best practices when carrying out this type of testing. FREE ACCESS
-
4m 8sIn this video, you'll learn more about what a business continuity plan is, why it matters to your business, and the benefits of business continuity planning. You'll also learn that a Business Continuity plan outlines procedures and instructions to follow should such a disaster occur. This typically includes a business impact analysis that outlines the cost of such a disaster, broken down by different aspects of your business. FREE ACCESS
-
4m 55sIn this video, you'll learn more about Developing a Business Continuity Plan. When disaster strikes, you can maintain your business functionality and weather the storm. You'll first discuss the process of developing your business continuity plan. What do you have to take into consideration? What aspects of your business should you focus on? Then, you'll learn about common business continuity threats and how to identify acceptable downtime for each critical function. FREE ACCESS
-
3m 53sIn this video, you'll learn more about how to test a Business Continuity Plan. You'll learn there are several methods you can take to test a business continuity plan. This includes a table-top exercise and a structured walk-through. The video outlines some of the benefits of testing your Business Continuity Plan. FREE ACCESS
-
3m 49sThis video discusses the importance of reviewing and improving a business continuity plan on a regular basis. It covers things to consider before the review, the key aspects of the review, and how to improve the plan following a review. FREE ACCESS
-
3mIn this video, you'll learn about the importance of having 'top-level' support for business plans and promoting user awareness. You'll also outline how to achieve these goals. You'll learn that an emergency strikes, most areas of a business are vulnerable to some sort of emergency incident. This means that it's essential that all business areas are aware of the company's business continuity plan and their role in that plan. FREE ACCESS
-
4m 41sIn this video, you'll learn about the importance of a Business Impact Analysis, or BIA. You'll learn that a BIA is used to predict the outcomes on your business of certain business disruptions. It does this by gathering and analyzing data relevant to your business. FREE ACCESS
-
7m 2sIn this video, you'll learn more about the IT Disaster Recovery Plan. You'll learn what a disaster recovery plan is, what the structure of a typical IT disaster recovery plan looks like, what some of the benefits of an IT disaster recovery plan are, and how to develop an effective data backup plan.You'll also learn about some of the options you have in data backup. So, watch this video to find out more about this topic. FREE ACCESS
-
4m 7sDuring this video, you will learn more about the details of creating a Disaster Recovery Plan. You will explore how complex IT systems can involve a lot of assets in the form of software and hardware, critical servers, data and cloud services. The first step to a Disaster Recovery Plan is to identify all of these assets, their locations, and how they interact with one another. You will also learn how to identify the context of these assets, such as how they function and how they relate to one another. FREE ACCESS
-
4m 5sWhen it comes to an IT recovery plan, there are different strategies to consider regarding how to implement your plan. There are also some common ways that IT fails, which any strategy you choose should take into consideration. In this video, we're going to look at IT Recovery Strategies. We'll look at an internal recovery strategy where a business manages their own internal systems and data backups and access to them. FREE ACCESS
-
1m 19s
EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE
Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.
Digital badges are yours to keep, forever.