Data Engineering on Microsoft Azure: Designing Data Storage Structures
Azure
| Beginner
- 11 videos | 1h 8m 8s
- Includes Assessment
- Earns a Badge
Planning the structure for data storage is integral to performance in big data operations. In this course, you'll learn about key considerations for data lakes and how to determine which file type and file format are the most appropriate for your use case. Then, you'll explore how to define how to design table storage for efficient querying and how data pruning can remove unnecessary data to accelerate transactions. You'll examine folder structures and data lake zones for organizing data effectively. Finally, you'll learn how to define storage tiers and how to manage the life cycle of data. This course is one in a collection that prepares learners for the Microsoft Data Engineering on Microsoft Azure (DP-203) exam.
WHAT YOU WILL LEARN
-
Discover the key concepts covered in this courseDescribe key considerations for designing a data lakeIdentify and evaluate criteria for selecting a file format for big data applicationsRecognize the defining characteristics of the supported file formats in azure data lakeDescribe steps for efficient read operations for a table storage serviceDescribe the dynamic data pruning feature in databricks at the file and partition level
-
Recognize an efficient folder structure designDefine the zones within a data lake for organizing data distributionDescribe the data access tiers in azure blob storage and how data can be moved between them for efficient and cost-effective storageDescribe the steps to archive data in an azure blob storage container, rehydrate blob data, and automate access tiers using life cycle managementSummarize the key concepts covered in this course
IN THIS COURSE
-
1m 30sThis course will introduce you to data lakes and the files types that are most appropriate for your use case. See how to design table storage that supports efficient queries, how to prune unneeded data, and how to organize data effectively. FREE ACCESS
-
7m 12sStudy how to design a data lake. Consider the reasons to use a data lake and compare its purpose with that of a data warehouse. Examine use cases for data lakes, such as for descriptive, diagnostic, predictive, and prescriptive analysis. Review the characteristics of a data lake, and some challenges. FREE ACCESS
-
7m 17sIdentify and evaluate criteria to select a file format for big data applications, such as text, or binary, data types, schema, and OLTP or OLAP. Review big data storage considerations, such as splitability, compression support, bath or streaming, organizational standards, and data catalog needs. FREE ACCESS
-
8m 46sExplore the defining characteristics of the supported file formats in Azure Data Lake. Examine benefits and issues with the comma separated value (CSV) format, extensible markup language (XML), Apache Avro, Apache Parquet, and optimized row columnar (ORC). Review protocol buffers. FREE ACCESS
-
8m 31sIn this video, you will study how to design efficient table storage for queries. Discover the benefits of denormalized data, point queries, and the long tail pattern. Consider alternate approaches to domain models. FREE ACCESS
-
6m 25sExamine the dynamic data pruning feature in Databricks at the file and partition level. Review nested filters, static partition pruning, pruning challenges, and dynamic partition pruning considerations. FREE ACCESS
-
7m 3sWatch how to design a folder structure. Discover why data structure is important for a data lake, so that it does not become a data swamp. Review governance practices for metadata management. See why nested elements should be avoided. FREE ACCESS
-
5m 51sExplore how to define the zones within a data lake to organize data distribution. Review the roles of data separation, governance, service level agreements, and security. Consider the requirements of the raw zone, the structured zone, the curated zone, the serving zone, and the exploratory zone. FREE ACCESS
-
6m 47sExamine the data access tiers in Azure Blob storage, and how data can be moved between them for efficient and cost-effective storage. Look at blob types, access tiers, blob lifecycle management, the archive tier, and immutable blobs. FREE ACCESS
-
7m 52sWalk through the steps to archive data in an Azure Blob storage container, rehydrate blob data, and automate access tiers using life cycle management. See how to move data between tiers for cost effective data management. FREE ACCESS
-
54sThis course introduced you to data lakes and the files types that are most appropriate for your use case. You learned how to design table storage that supports efficient queries, how to prune unneeded data, and how to organize data effectively. FREE ACCESS
EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE
Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.
Digital badges are yours to keep, forever.