Monitoring Distributed Systems

SRE    |    Intermediate
  • 14 videos | 30m 16s
  • Includes Assessment
  • Earns a Badge
Rating 4.7 of 289 users Rating 4.7 of 289 users (289)
Principles and techniques are key in building a successful monitoring and alerting system. In this course, you'll explore the 'four golden signals' of monitoring while learning how to differentiate between symptoms and causes. You'll also learn about the guidelines for designing a monitoring system, questions to ask when creating rules for monitoring, and how to monitor for the long term.

WHAT YOU WILL LEARN

  • Discover the key concepts covered in this course
    Provide an overview of nodes and machines
    Provide an overview of root cause
    Differentiate between symptoms and causes
    Provide an overview of the 'four golden signals' of monitoring
    Describe the importance of focusing on monitoring traffic and how it applies to the four golden signals
    Describe the importance of focusing on the 'errors' metric and how it applies to the four golden signals
  • Provide an overview of saturation and how it applies to the four golden signals
    Recognize strategies for effective monitoring and addressing mean values
    Choose an appropriate resolution for measurements
    List guidelines to keep in mind when designing a monitoring system
    Determine the appropriate questions to ask when creating rules for your monitoring solution
    Recognize the importance of how decisions about monitoring are made with long terms goals
    Summarize the key concepts covered in this course

IN THIS COURSE

  • 1m 19s
  • 2m 15s
  • Locked
    3.  Root Cause
    2m 30s
    In this video, you'll learn more about the root cause of an issue and how to identify and deal with it when a problem arises. You'll discover that despite the term root which might have a connotation of singular with respect to the cause of an issue, there can be more than one root cause. For example, defects in hardware or software could be the root cause of a problem. FREE ACCESS
  • Locked
    4.  Symptoms vs. Causes
    1m 43s
    In this video, you'll learn about symptoms versus causes. This involves asking what's wrong and why. Asking what's wrong equates to identifying the symptoms, and asking why equates to uncovering the root cause. FREE ACCESS
  • Locked
    5.  Latency
    3m 42s
  • Locked
    6.  Traffic
    2m 13s
    In this video, you'll learn more about traffic and the importance of monitoring it or the number of requests flowing across your network. You'll learn that one of the most significant values to look for when monitoring traffic is the peak value, but not just what that value is. You also need to know how often it's hit, and for how long it remains at or around that value. FREE ACCESS
  • Locked
    7.  Errors
    1m 43s
    In this video, you'll learn more about errors and their importance with respect to the four golden signals of monitoring. An error can describe many different situations including misconfigurations in your overall infrastructure, bugs in your code or broken dependencies between system components and many other things. And any given error may be insufficient in terms of understanding the problem as a whole, but really that's the key when it comes to monitoring errors. FREE ACCESS
  • Locked
    8.  Saturation
    1m 53s
    In this video, you'll learn more about saturation and its overall importance in terms of the four golden signals. This refers to how close you are to reaching the maximum capacity of your resources. Every resource has a limit, but some are more readily increased than others. For example, it's easier to increase storage capacity by adding a disk to an array. FREE ACCESS
  • Locked
    9.  Mean Values
    2m 7s
    In this video, you'll learn about potential issues that can arise when a monitoring system is built around mean value metrics or average values. If you're designing a new monitoring system from scratch, it can often be tempting to base your measurements on the mean value of any given metrics such as the average latency for a service, or maybe the average CPU usage for a node. FREE ACCESS
  • Locked
    10.  Measurements
    2m 25s
    In this video, you'll learn more about the considerations for determining an appropriate measurement resolution for a system. This means finding the balance between how frequently your measures are taken versus the level of detail required to yield useful data. You'll also learn not to over monitor but under monitor as well. You'll learn about maintaining a balance is key to an effective monitoring solution. FREE ACCESS
  • Locked
    11.  Designing a Monitoring System
    1m 57s
  • Locked
    12.  Creating Rules for Monitoring
    2m 49s
    In this video, you'll learn more about determining the appropriate questions to ask when creating rules for your monitoring solution. This can help avoid false positives and waste time and energy chasing down issues that are of little consequence. You'll begin with asking if a rule detects an otherwise undetected condition. Now this sounds simple enough, but as monitoring solutions evolve, it can be easy to create rules that already exist. FREE ACCESS
  • Locked
    13.  Long Term Monitoring
    2m 34s
    In this video, you'll learn about the importance of making decisions about your approach to monitoring with long term goals in mind. Monitoring in and of itself often has a connotation of keeping track of what is going on right now. However, monitoring any form should not be to simply keep an eye on things because the goals with respect to your system as a whole should be to continually improve it. You'll discover that your approach to monitoring should have the same goals, improving the system. FREE ACCESS
  • Locked
    14.  Course Summary
    1m 4s
    In this video, you'll summarize what you've learned in the course. You've examined the monitoring of distributed systems, including nodes and machines in a distributed system and the differences between symptoms and causes. You also learned about mean values strategies and how to choose an appropriate resolution for measurements. You also learned guidelines for designing a monitoring system and how to create appropriate rules for monitoring. In this video, you'll summarize what you've learned in the course. FREE ACCESS

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.

Digital badges are yours to keep, forever.

YOU MIGHT ALSO LIKE

Rating 4.6 of 5 users Rating 4.6 of 5 users (5)
Rating 4.7 of 21 users Rating 4.7 of 21 users (21)
Rating 4.6 of 5 users Rating 4.6 of 5 users (5)

PEOPLE WHO VIEWED THIS ALSO VIEWED THESE

Rating 4.8 of 174 users Rating 4.8 of 174 users (174)
Rating 4.8 of 255 users Rating 4.8 of 255 users (255)
Rating 4.7 of 316 users Rating 4.7 of 316 users (316)