Using Hive to Optimize Query Executions with Partitioning
Apache Hive 2.3.2
| Intermediate
- 10 videos | 1h 47s
- Includes Assessment
- Earns a Badge
Continue to explore the versatility of Apache Hive, among today's most popular data warehouses, in this 10-video Skillsoft Aspire course. Learners are shown ways to optimize query executions, including the powerful technique of partitioning data sets. The hands-on course assumes previous work with Hive tables using the Hive query language and in processing complex data types, along with theoretical understanding of improving query performance by partitioning very large data sets. Demonstrations focus on basics of partitioning and how to create partitions and load data into them. Learners work with both Hive-managed tables and external tables to see how partitioning works for each; then watch navigating to the shell of the Hadoop master node, and creating new directories in the Hadoop file system. Observe dynamic partitioning of tables and how this simplifies loading of data into partitions. Finally, you explore how using multiple columns in a table can partition data within it. During this course, learners will acquire a sound understanding of how exactly large data sets can be partitioned into smaller chunks, improving query performance.
WHAT YOU WILL LEARN
-
Use the google cloud platform's dataproc service to provision a hadoop cluster. not required if you already have a hadoop environment set up with hiveDefine a table which will contain data partitioned based on the value in one of its columnsInsert data into partitions of a hive table and explore the partition and its data on hdfsLoad data into table partitions from filesCreate and populate partitions in an external table
-
Alter the definition of a partition to modify its contentsDefine and work with dynamic partitions on your hive tablesConfigure a table to use more than one column to define partitions and explore the partition on hdfsuse partitioning to boost query performance in hdfs
IN THIS COURSE
-
2m 26s
-
4m 52sLearn how to use the Google Cloud Platform's Dataproc service to provision a Hadoop cluster. This is not required if you already have a Hadoop environment set up with Hive. FREE ACCESS
-
6m 16sLearn how to define a table which will contain data partitioned based on the value in one of its columns. FREE ACCESS
-
7m 2sIn this video, you will learn how to insert data into partitions of a Hive table and explore the partition and its data on HDFS. FREE ACCESS
-
7m 43sIn this video, you will learn how to load data into table partitions from files. FREE ACCESS
-
7m 21sIn this video, find out how to create and populate partitions in an external table. FREE ACCESS
-
4m 28sIn this video, you will change the definition of a partition to modify its contents. FREE ACCESS
-
7m 12sIn this video, you will define and work with dynamic partitions on your Hive tables. FREE ACCESS
-
7m 47sFind out how to configure a table to use more than one column to define partitions and explore the partitions on HDFS. FREE ACCESS
-
5m 41sIn this video, you will use partitioning to improve query performance in HDFS. FREE ACCESS
EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE
Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.
Digital badges are yours to keep, forever.