SRE Data Pipelines & Integrity: Data Pipelines

SRE    |    Intermediate
  • 21 videos | 1h 11m 12s
  • Includes Assessment
  • Earns a Badge
Rating 4.6 of 20 users Rating 4.6 of 20 users (20)
Site reliability engineers often find data processing complex as demands for faster, more reliable, and extra cost-effective results continue to evolve. In this course, you'll explore techniques and best practices for managing a data pipeline. You'll start by examining the various pipeline application models and their recommended uses. You'll then learn how to define and measure service level objectives, plan for dependency failures, and create and maintain pipeline documentation. Next, you'll outline the phases of a pipeline development lifecycle's typical release flow before investigating more challenging topics such as managing data processing pipelines, using big data with simple data pipelines, and using periodic pipeline patterns. Lastly, you'll delve into the components of Google Workflow and recognize how to work with this system.

WHAT YOU WILL LEARN

  • Discover the key concepts covered in this course
    Describe the characteristics of and rationale for using data processing pipelines
    Recognize characteristics of the extract transform load (etl) pipeline model
    Define business intelligence and data analytics in the context of data processing and give an example data analytics use case
    List characteristics of machine learning (ml) applications
    Define what is meant by service-level objectives (slos) and describe how they relate to pipeline data
    Outline how to plan for dependency failures
    Recognize how to create and maintain pipeline documentation
    Outline the stages of a typical development lifecycle
    Describe how to reduce hotspotting
    Recognize how to implement autoscaling to handle spikes in workloads
  • Describe how to adhere best to access control and security policies
    Plan escalation paths that ensure quick and proactive communication
    Describe the effect big data can have on simple pipeline patterns
    List the challenges with using the periodic pipeline pattern
    Describe the issues that can occur due to uneven work distribution
    List the potential drawbacks of periodic pipelines in distributed environments
    Describe what comprises google workflow and outline how it works
    Outline the stages of execution in google workflow, describing what they entail
    Recognize the key factors to ensuring business continuity in big data pipelines using google workflow
    Summarize the key concepts covered in this course

IN THIS COURSE

  • 1m 53s
  • 4m 38s
    Data processing pipelines are software that help organize and transform large datasets into usable information. They play an important role in modern business operations by providing fast, reliable, and accurate results. Data collection today is different than it was in the past, and data processing pipelines are essential to transforming these massive amounts of data into useful information. FREE ACCESS
  • Locked
    3.  The Extract Transform Load (ETL) Pipeline Model
    3m 8s
    The Extract Transform Load (ETL) model is a data transformation model used in business intelligence and IT. ETL pipelines are used to transform data from one format to another, and can be used for a variety of purposes such as preparing data for analysis or serving it to another application. FREE ACCESS
  • Locked
    4.  Business Intelligence and Data Processing
    2m 52s
    In this video, you'll learn more about Business Intelligence or BI. It refers to technologies, tools, and practices for pulling together, incorporating, analyzing, and presenting large volumes of data to aid the business with better decision making. You'll learn that BI is the use of databases to store and link data from various areas of a business together so that you can run reports and analyze that data to make key business decisions. FREE ACCESS
  • Locked
    5.  Features of Machine Learning Apps
    4m 18s
    In this video, you'll learn more about machine learning. This is an application of artificial intelligence that offers systems the ability to learn and improve from experience and data. Machine learning involves giving the computer some data and telling it to go learn from that data. There are different types of machine learning, including supervised, unsupervised, and reinforcement learning. FREE ACCESS
  • Locked
    6.  Service-level Objectives (SLOs) and Data Pipelines
    4m 35s
    In this video, you'll learn how to define what is meant by service-level objectives, or SLOs. You'll discover that an SLO is a target value or range of values for a service level that is measured by a SLI, or a service-level indicator. An SLI is what we measure. So an SLI could be latency, availability, uptime, for example, and the service-level objective is what we want to meet. FREE ACCESS
  • Locked
    7.  Planning for Dependency Failure
    3m 32s
    This session is about planning for dependency failures. The speaker discusses how to design for the largest failure that a service level agreement promises and how to plan for stage planned outages in order to be proactive instead of reactive. FREE ACCESS
  • Locked
    8.  Managing System Documentation
    5m 3s
    The objectives of this session are to discuss how to create and maintain system documentation, as well as how to identify when a pipeline is running slow. The presenter will also discuss ways to document processes and how to automate them. FREE ACCESS
  • Locked
    9.  Development Lifecycle Stages
    5m 39s
    In this video, you'll watch a demo on how to outline the stages of a typical development lifecycle. You'll learn that for starters, prototyping is the first phase of development for our pipeline and for verifying our semantic. It allows us to make sure we can implement the business logic that we need to execute the pipeline. This may mean making a decision on one programming language over another because it integrates with existing libraries. FREE ACCESS
  • Locked
    10.  Reducing Hotspotting
    4m 8s
    In this video, you'll learn how to describe the concept of hotspotting. Hotspotting occurs when resources become overloaded, so they get excessive access and this often results in operational failure. Pipelines are susceptible to workload patterns through reads and writes, causing delays in isolated reasons of data. You'll learn that when data for a particular query is concentrated on a limited number of nodes, it can hotspot. FREE ACCESS
  • Locked
    11.  Implementing Autoscaling for Workload Spikes
    3m 38s
    In this video, you'll learn the concept of autoscaling. This can help handle workload spikes. Autoscaling is good if your pipeline needs additional server resources to satisfy the number of processing jobs. You'll learn how to implement autoscaling to handle spikes in workloads. FREE ACCESS
  • Locked
    12.  Adhering to Security Policies
    2m 59s
    In this session, we will be discussing security policies and how to adhere to them. One of the most important aspects of security is data privacy. Data privacy is the practice of protecting the privacy of data. Security policies are necessary in order to protect data from unauthorized access and alteration. Adherence to security policies protects data by limiting who has access to it, when they have access to it, and how they use it. FREE ACCESS
  • Locked
    13.  Planning Escalation Paths
    2m 16s
    In this video, you will learn about pipeline design. You will want to ensure pipeline resiliency when you design your pipelines. Resilient data pipelines adapt in the event of failure. You will learn that a resilient data pipeline needs to detect failures, recover from failures, and return accurate data to the customer. FREE ACCESS
  • Locked
    14.  Big Data and Simple Pipelines
    2m 23s
    The objective of this session is to discuss the effect big data has on simple pipelines. Multiphase pipelines are typically used when processing big data because they increase the pipeline depth, making it easier to reason about and troubleshoot. FREE ACCESS
  • Locked
    15.  The Periodic Pipeline Pattern
    2m 25s
  • Locked
    16.  Issues with Uneven Work Distribution
    2m 51s
  • Locked
    17.  Periodic Pipelines in Distributed Environments
    3m 21s
  • Locked
    18.  Google Workflow's Composition
    3m 17s
  • Locked
    19.  Google Workflow's Stages of Execution
    1m 43s
  • Locked
    20.  Ensuring Business Continuity with Google Workflow
    4m 50s
  • Locked
    21.  Course Summary
    1m 42s

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.

Digital badges are yours to keep, forever.

PEOPLE WHO VIEWED THIS ALSO VIEWED THESE

Rating 4.6 of 25 users Rating 4.6 of 25 users (25)
Rating 4.5 of 445 users Rating 4.5 of 445 users (445)
Rating 4.1 of 178 users Rating 4.1 of 178 users (178)