Hadoop HDFS Getting Started
Apache Hadoop
| Beginner
- 12 videos | 1h 14m 36s
- Includes Assessment
- Earns a Badge
Explore the concepts of analyzing large data sets in this 12-video Skillsoft Aspire course, which deals with Hadoop and its Hadoop Distributed File System (HDFS), which enables parallel processing of big data efficiently in a distributed cluster. The course assumes a conceptual understanding of Hadoop and its components; purely theoretical, it contains no labs, with just enough information provided to understand how Hadoop and HDFS allow processing big data in parallel. The course opens by explaining the ideas of vertical and horizontal scaling, then discusses functions served by Hadoop to horizontally scale data processing tasks. Learners explore functions of YARN, MapReduce, and HDFS, covering how HDFS keeps track of where all pieces of large files are distributed, replication of data, and how HDFS is used with Zookeeper: a tool maintained by the Apache Software Foundation and used to provide coordination and synchronization in distributed systems, along with other services related to distributed computing-a naming service, configuration management, and so on. Learn about Spark, a data analytics engine for distributed data processing.
WHAT YOU WILL LEARN
-
Recognize the need to process massive datasets at scaleDescribe the benefits of horizontal scaling for processing big data and the challenges of this approachRecall the features of a distributed cluster which address the challenges of horizontal scalingIdentify the features of hdfs which enables large datasets to be distributed across a clusterDescribe the simple and high-availability architectures of hdfs and the implementations for each of themIdentify the role of hadoop's mapreduce in processing chunks of big datasets in parallel
-
Recognize the role of the yarn resource negotiator in enabling map and reduce operations to execute on a clusterDescribe the steps involved in resource allocation and job execution for operations on a hadoop clusterRecall how apache zookeeper enables the hdfs namenode and yarn resourcemanager to run in high-availability modeIdentify various technologies which integrate with hadoop and simplify the task of big data processingRecognize the key features of distributed clusters, hdfs, and the input outs of the map and reduce phases
IN THIS COURSE
-
2m 17s
-
4m 29sAfter completing this video, you will be able to recognize the need to process massive datasets quickly. FREE ACCESS
-
7m 12sAfter completing this video, you will be able to describe the benefits of horizontal scaling for processing big data and the challenges of this approach. FREE ACCESS
-
8m 1sAfter completing this video, you will be able to recall the features of a distributed cluster that address the challenges of horizontal scaling. FREE ACCESS
-
4m 52sIn this video, find out how to identify the features of HDFS which enable large datasets to be distributed across a cluster. FREE ACCESS
-
6m 51sUpon completion of this video, you will be able to describe the simple and high-availability architectures of HDFS and the implementations for each of them. FREE ACCESS
-
8m 24sIn this video, you will identify the role of Hadoop's MapReduce in processing chunks of big datasets in parallel. FREE ACCESS
-
6m 49sUpon completion of this video, you will be able to recognize the role of the YARN resource negotiator in enabling Map and Reduce operations to execute on a cluster. FREE ACCESS
-
2m 43sUpon completion of this video, you will be able to describe the steps involved in resource allocation and job execution for operations on a Hadoop cluster. FREE ACCESS
-
8m 25sUpon completion of this video, you will be able to recall how Apache Zookeeper enables the HDFS NameNode and YARN ResourceManager to run in a high-availability mode. FREE ACCESS
-
8m 9sIn this video, you will identify various technologies that integrate with Hadoop and simplify the task of big data processing. FREE ACCESS
-
6m 26sAfter completing this video, you will be able to recognize the key features of distributed clusters, HDFS, and the inputs and outputs of the Map and Reduce phases. FREE ACCESS
EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE
Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.
Digital badges are yours to keep, forever.