SRE Data Pipelines & Integrity: Data Pipelines
SRE
| Intermediate
- 21 videos | 1h 11m 12s
- Includes Assessment
- Earns a Badge
Site reliability engineers often find data processing complex as demands for faster, more reliable, and extra cost-effective results continue to evolve. In this course, you'll explore techniques and best practices for managing a data pipeline. You'll start by examining the various pipeline application models and their recommended uses. You'll then learn how to define and measure service level objectives, plan for dependency failures, and create and maintain pipeline documentation. Next, you'll outline the phases of a pipeline development lifecycle's typical release flow before investigating more challenging topics such as managing data processing pipelines, using big data with simple data pipelines, and using periodic pipeline patterns. Lastly, you'll delve into the components of Google Workflow and recognize how to work with this system.
WHAT YOU WILL LEARN
-
Discover the key concepts covered in this courseDescribe the characteristics of and rationale for using data processing pipelinesRecognize characteristics of the extract transform load (etl) pipeline modelDefine business intelligence and data analytics in the context of data processing and give an example data analytics use caseList characteristics of machine learning (ml) applicationsDefine what is meant by service-level objectives (slos) and describe how they relate to pipeline dataOutline how to plan for dependency failuresRecognize how to create and maintain pipeline documentationOutline the stages of a typical development lifecycleDescribe how to reduce hotspottingRecognize how to implement autoscaling to handle spikes in workloads
-
Describe how to adhere best to access control and security policiesPlan escalation paths that ensure quick and proactive communicationDescribe the effect big data can have on simple pipeline patternsList the challenges with using the periodic pipeline patternDescribe the issues that can occur due to uneven work distributionList the potential drawbacks of periodic pipelines in distributed environmentsDescribe what comprises google workflow and outline how it worksOutline the stages of execution in google workflow, describing what they entailRecognize the key factors to ensuring business continuity in big data pipelines using google workflowSummarize the key concepts covered in this course
IN THIS COURSE
-
1m 53s
-
4m 38sData processing pipelines are software that help organize and transform large datasets into usable information. They play an important role in modern business operations by providing fast, reliable, and accurate results. Data collection today is different than it was in the past, and data processing pipelines are essential to transforming these massive amounts of data into useful information. FREE ACCESS
-
3m 8sThe Extract Transform Load (ETL) model is a data transformation model used in business intelligence and IT. ETL pipelines are used to transform data from one format to another, and can be used for a variety of purposes such as preparing data for analysis or serving it to another application. FREE ACCESS
-
2m 52sIn this video, you'll learn more about Business Intelligence or BI. It refers to technologies, tools, and practices for pulling together, incorporating, analyzing, and presenting large volumes of data to aid the business with better decision making. You'll learn that BI is the use of databases to store and link data from various areas of a business together so that you can run reports and analyze that data to make key business decisions. FREE ACCESS
-
4m 18sIn this video, you'll learn more about machine learning. This is an application of artificial intelligence that offers systems the ability to learn and improve from experience and data. Machine learning involves giving the computer some data and telling it to go learn from that data. There are different types of machine learning, including supervised, unsupervised, and reinforcement learning. FREE ACCESS
-
4m 35sIn this video, you'll learn how to define what is meant by service-level objectives, or SLOs. You'll discover that an SLO is a target value or range of values for a service level that is measured by a SLI, or a service-level indicator. An SLI is what we measure. So an SLI could be latency, availability, uptime, for example, and the service-level objective is what we want to meet. FREE ACCESS
-
3m 32sThis session is about planning for dependency failures. The speaker discusses how to design for the largest failure that a service level agreement promises and how to plan for stage planned outages in order to be proactive instead of reactive. FREE ACCESS
-
5m 3sThe objectives of this session are to discuss how to create and maintain system documentation, as well as how to identify when a pipeline is running slow. The presenter will also discuss ways to document processes and how to automate them. FREE ACCESS
-
5m 39sIn this video, you'll watch a demo on how to outline the stages of a typical development lifecycle. You'll learn that for starters, prototyping is the first phase of development for our pipeline and for verifying our semantic. It allows us to make sure we can implement the business logic that we need to execute the pipeline. This may mean making a decision on one programming language over another because it integrates with existing libraries. FREE ACCESS
-
4m 8sIn this video, you'll learn how to describe the concept of hotspotting. Hotspotting occurs when resources become overloaded, so they get excessive access and this often results in operational failure. Pipelines are susceptible to workload patterns through reads and writes, causing delays in isolated reasons of data. You'll learn that when data for a particular query is concentrated on a limited number of nodes, it can hotspot. FREE ACCESS
-
3m 38sIn this video, you'll learn the concept of autoscaling. This can help handle workload spikes. Autoscaling is good if your pipeline needs additional server resources to satisfy the number of processing jobs. You'll learn how to implement autoscaling to handle spikes in workloads. FREE ACCESS
-
2m 59sIn this session, we will be discussing security policies and how to adhere to them. One of the most important aspects of security is data privacy. Data privacy is the practice of protecting the privacy of data. Security policies are necessary in order to protect data from unauthorized access and alteration. Adherence to security policies protects data by limiting who has access to it, when they have access to it, and how they use it. FREE ACCESS
-
2m 16sIn this video, you will learn about pipeline design. You will want to ensure pipeline resiliency when you design your pipelines. Resilient data pipelines adapt in the event of failure. You will learn that a resilient data pipeline needs to detect failures, recover from failures, and return accurate data to the customer. FREE ACCESS
-
2m 23sThe objective of this session is to discuss the effect big data has on simple pipelines. Multiphase pipelines are typically used when processing big data because they increase the pipeline depth, making it easier to reason about and troubleshoot. FREE ACCESS
-
2m 25s
-
2m 51s
-
3m 21s
-
3m 17s
-
1m 43s
-
4m 50s
-
1m 42s
EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE
Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.
Digital badges are yours to keep, forever.