SRE Troubleshooting: Tools
SRE
| Intermediate
- 13 videos | 41m 1s
- Includes Assessment
- Earns a Badge
Site reliability engineers (SREs) are typically good problem solvers. They need to think logically to identify problems, correct them, and prevent them from happening again. In this course, you'll explore several built-in and open-source troubleshooting tools SREs can use for resolving system issues. You'll start by examining the techniques of logging and whitebox and blackbox monitoring used to monitor system events. You'll then work with the various built-in Windows troubleshooting tools, namely the Event Viewer, Resource Monitor, and System Information tools. Next, you'll use Google Cloud Dataflow to process logs, before outlining the purpose and benefits of the StatsD standard and the /api/search endpoint. Lastly, you'll identify how Google's Dapper is used for troubleshooting distributed systems, and the open standards tool, Prometheus, for instrumenting software and exposing metrics.
WHAT YOU WILL LEARN
-
Discover the key concepts covered in this courseOutline the process and purpose of logging and name the benefits of text logsDescribe the characteristics and purpose of whitebox monitoringDescribe the characteristics and purpose of blackbox monitoringAccess and navigate the windows event viewerOpen the system information panel in windows and use it to view and collect system informationUse windows resource monitor to display real-time hardware and software usage information
-
Summarize the characteristics of dapper and outline how it can be used to troubleshoot a distributed systemProcess logs using the google cloud dataflow workflow toolRecognize how the statsd standard is used for instrumenting software and exposing metricsOutline the characteristics, components, and purpose of the prometheus open source systems monitoring and alerting toolkitOutline how to manually send a request to the /api/search endpoint to identify failuresSummarize the key concepts covered in this course
IN THIS COURSE
-
1m 27s
-
3m 44s
-
3m 46s
-
3m 7sIn this video, you'll learn about blackbox monitoring. If whitebox monitoring is supposed to focus on information from specific applications, blackbox monitoring is supposed to focus on information on the system itself. Blackbox monitoring makes sure that the system is functioning properly by monitoring key indicators about the environment your applications are running on. For example, blackbox monitoring can check that you're not running out of disk space. FREE ACCESS
-
3m 2sIn this video, you'll learn how to access and navigate the Windows Event Viewer. You'll discover where to find the Event Viewer and what data you can find in each section. The Windows Event Viewer has been around for a long time, logging system information for Windows itself as well as apps. FREE ACCESS
-
2m 33sIn this video, you'll learn how to open, view, and collect system information from Windows. You'll discover where to find the System Information window and what data you can find in there. The System Information window is useful for troubleshooting specific issues on your system. FREE ACCESS
-
3m 27sIn this video, you'll learn how to use Resource Monitor to display metrics about your system in real time. You'll learn where to find Resource Monitor, how the most CPU intensive application is, and the application using the most memory. FREE ACCESS
-
5m 6sIn this video, you'll learn more about Dapper. This software tends toward complex and distributed microservice architectures, which make building applications easier, but each of these microservices are effectively independent modules. Each module can be developed by different teams using different languages and with different requirements. Some modules perform simple tasks without a lot of compute needs, while others might require global distribution across thousands of machines. FREE ACCESS
-
6m 13sIn this demo, you'll learn how to process logs using Google's Dataflow work flow tool. To do that, you'll run a simple Python script through Google Cloud Shell and explore the information that was sent. You'll begin by navigating to your Dataflow instance. Then, you'll open up the Cloud Shell. The screen displays the Google Cloud Platform Dataflow service. Now at the top of the screen, make sure you've selected the appropriate project. FREE ACCESS
-
1m 59sIn this video, you'll learn about instrumenting code. It helps gather information about what's happening in your code, as well as gather metrics about application health. One popular tool for instrumentation is StatsD. It's an open standard that was originally written by Etsy as a metric aggregation daemon. FREE ACCESS
-
3m 12sIn this video, you'll learn more about instrumenting code. You'll learn that instrumenting code is what helps gather information about what's happening in your code, as well as gather metrics about application health. On screen, you'll see a chart with a diagram of a computer system. This diagram shows how Prometheus works by listing the components and their functions. The chart lists these components: Server, Client libraries, Push gateway, Exporters, Alert manager, and Other tools. FREE ACCESS
-
2m 28s
-
58s
EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE
Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.
Digital badges are yours to keep, forever.