Apache Hadoop: Apache Hadoop 1.0 intermediate

https://www.skillsoft.com/channel/apache-hadoop-e274f360-e199-11e6-93f3-0242c0a80605?technologyandversion=66168&expertiselevel=66167 https://www.skillsoft.com/channel/apache-hadoop-e274f360-e199-11e6-93f3-0242c0a80605?technologyandversion=66172&expertiselevel=66167 https://www.skillsoft.com/channel/apache-hadoop-e274f360-e199-11e6-93f3-0242c0a80605?technologyandversion=66168&expertiselevel=66170 https://www.skillsoft.com/channel/apache-hadoop-e274f360-e199-11e6-93f3-0242c0a80605?technologyandversion=66169&expertiselevel=66170 https://www.skillsoft.com/channel/apache-hadoop-e274f360-e199-11e6-93f3-0242c0a80605?technologyandversion=66171&expertiselevel=66170 https://www.skillsoft.com/channel/apache-hadoop-e274f360-e199-11e6-93f3-0242c0a80605?technologyandversion=66172&expertiselevel=66170 https://www.skillsoft.com/channel/apache-hadoop-e274f360-e199-11e6-93f3-0242c0a80605?technologyandversion=66173&expertiselevel=66170

7 Courses | 5h 28m 28s
7 Books | 32h 25m

6 Courses | 6h 9m 24s
4 Books | 18h 27m

19 Courses | 14h 48m 2s
8 Books | 39h 5m

23 Courses | 25h 1m 44s
8 Books | 42h 22m

4 Courses | 4h 13m 12s
1 Book | 3h 4m

3 Courses | 2h 59m 55s
4 Books | 18h 27m

5 Courses | 3h 36m 44s
8 Books | 36h 48m

(1)

Apache Hadoop is an open source framework for the storage and processing of big data. Come explore the ins and outs of Hadoop.

GETTING STARTED

Fundamentals & Installation

3m 14s
2m 54s

+10 MORE VIDEOS | FREE ACCESS

GETTING STARTED

Hadoop HDFS Getting Started

2m 17s
4m 29s

+10 MORE VIDEOS | FREE ACCESS

GETTING STARTED

Ecosystem for Hadoop

4m 52s
3m 22s

+6 MORE VIDEOS | FREE ACCESS

GETTING STARTED

Designing Clusters

5m 50s
5m 43s

+4 MORE VIDEOS | FREE ACCESS

GETTING STARTED

Managing Big Data Using HDInsight Hadoop

6m 37s
5m 25s

+10 MORE VIDEOS | FREE ACCESS

GETTING STARTED

Hadoop HDFS File Permissions

2m 11s
7m 42s

+7 MORE VIDEOS | FREE ACCESS

GETTING STARTED

Hadoop Distributed File System

6m 55s
3m 19s

+9 MORE VIDEOS | FREE ACCESS

COURSES INCLUDED

Fundamentals & Installation

Apache Hadoop is a set of algorithms for distributed storage and distributed processing of very large data sets. Get started with Hadoop by learning about big data, and how to install and use Hadoop.

12 videos | 40m Assessment Badge

Storage & MapReduce

MapReduce is a framework for writing applications to process huge amounts of data. Let's look at Hadoop storage, MapReduce, and how to use MapReduce with associated development tools.

11 videos | 46m Assessment Badge

Programming with MapReduce

You must have a good understanding of MapReduce to be able to program with it. Here we look at MapReduce in detail, and demonstrate the basics of programming in MapReduce.

16 videos | 1h 6m Assessment Badge

Using Hive & Pig with Hadoop

There are components other than MapReduce that let you write code to process large data sets stored in Hadoop. Let's see how to work with two such components - Hive and Pig.

7 videos | 32m Assessment Badge

Introduction to Hadoop

Hadoop is an open-source, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. Explore Hadoop, its key tools, and applications.

10 videos | 41m Assessment Badge

Ecosystem & MapReduce

Hadoop is a framework providing for distributed storage and processing of large data sets. Explore the Hadoop ecosystem and Java MapReduce.

10 videos | 38m Assessment Badge

Introduction to Data Modeling

Discover various data genres and management tools, the reasons behind the evolving plethora of new big data platforms from the perspective of big data management systems, and analytical tools.

15 videos | 1h 1m Assessment Badge

FREE ACCESS

COURSES INCLUDED

Hadoop HDFS Getting Started

Explore the concepts of analyzing large data sets in this 12-video Skillsoft Aspire course, which deals with Hadoop and its Hadoop Distributed File System (HDFS), which enables parallel processing of big data efficiently in a distributed cluster. The course assumes a conceptual understanding of Hadoop and its components; purely theoretical, it contains no labs, with just enough information provided to understand how Hadoop and HDFS allow processing big data in parallel. The course opens by explaining the ideas of vertical and horizontal scaling, then discusses functions served by Hadoop to horizontally scale data processing tasks. Learners explore functions of YARN, MapReduce, and HDFS, covering how HDFS keeps track of where all pieces of large files are distributed, replication of data, and how HDFS is used with Zookeeper: a tool maintained by the Apache Software Foundation and used to provide coordination and synchronization in distributed systems, along with other services related to distributed computing-a naming service, configuration management, and so on. Learn about Spark, a data analytics engine for distributed data processing.

12 videos | 1h 14m Assessment Badge

Introduction to the Shell for Hadoop HDFS

In this Skillsoft Aspire course, learners discover how to set up a Hadoop Cluster on the cloud and explore bundled web apps-the YARN Cluster Manager app and the HDFS (Hadoop Distributed File System) NameNode UI. This 9-video course assumes a good understanding of what Hadoop is, and how HDFS enables processing of big data in parallel by distributing large data sets across a cluster; learners should also be familiar with running commands from the Linux shell, with some fluency in basic Linux file system commands. The course opens by exploring two web applications which are packaged with Hadoop, the UI for the YARN cluster manager, and the node name UI for HDFS. Learners then explore two shells which can be used to work with HDFS, the Hadoop FS shell and Hadoop DFS shell. Next, you will explore basic commands which can be used to navigate HDFS; discuss their similarities with Linux file system commands; and discuss distributed computing. In a closing exercise, practice identifying web applications used to explore and also monitor Hadoop.

9 videos | 52m Assessment Badge

Working with Files in Hadoop HDFS

In this Skillsoft Aspire course, learners will encounter basic Hadoop file system operations such as viewing the contents of directories and creating new ones. This 8-video course assumes good understanding of what Hadoop is, and how HDFS enables processing of big data in parallel by distributing large data sets across a cluster; learners should also be familiar with running commands from the Linux shell, with some fluency in basic Linux file system commands. Begin by working with files in various ways, including transferring files between a local file system and HDFS (Hadoop Distributed File System) and explore ways to create and delete files on HDFS. Then examine different ways to modify files on HDFS. After exploring the distributed computing concept, prepare to begin working with HDFS in a production setting. In the closing exercise, write a command to create a directory/data/products/files on HDFS, for which data/products may not exist; list two commands for two copy operations-one from local file system to HDFS, and another for reverse transfer, from HDFS to local host.

8 videos | 47m Assessment Badge

Hadoop & MapReduce Getting Started

In this course, learners will explore the theory behind big data analysis using Hadoop, and how MapReduce enables parallel processing of large data sets distributed on a cluster of machines. Begin with an introduction to big data and the various sources and characteristics of data available today. Look at challenges involved in processing big data and options available to address them. Next, a brief overview of Hadoop, its role in processing big data, and the functions of its components such as the Hadoop Distributed File System (HDFS), MapReduce, and YARN (Yet Another Resource Negotiator). Explore the working of Hadoop's MapReduce framework to process data in parallel on a cluster of machines. Recall steps involved in building a MapReduce application and specifics of the Map phase in processing each row of the input file's data. Recognize the functions of the Shuffle and Reduce phases in sorting and interpreting the output of the Map phase to produce a meaningful output. To conclude, complete an exercise on the fundamentals of Hadoop and MapReduce.

8 videos | 1h 3m Assessment Badge

Developing a Basic MapReduce Hadoop Application

In this Skillsoft Aspire course, discover how to use Hadoop's MapReduce; provision a Hadoop cluster on the cloud; and build an application with MapReduce to calculate word frequencies in a text document. To start, create a Hadoop cluster on the Google Cloud Platform using its Cloud Dataproc service; then work with the YARN Cluster Manager and HDFS (Hadoop Distributed File System) NameNode web applications that come packaged with Hadoop. Use Maven to create a new Java project for the MapReduce application, and develop a mapper for word frequency application. Create a Reducer for the application that will collect Mapper output and calculate word frequencies in input text files, and identify configurations of MapReduce applications in the Driver program and the project's pom.xml file. Next, build the MapReduce word frequency application with Maven to produce a jar file and prepare for execution from the master node of the Hadoop cluster. Finally, run the application and examine outputs generated to get word frequencies in the input text document. The exercise involves developing a basic MapReduce application.

10 videos | 1h 13m Assessment Badge

Filtering Data Using Hadoop MapReduce

Extracting meaningful information from a very large dataset can be painstaking. In this Skillsoft Aspire course, learners examine how Hadoop's MapReduce can be used to speed up this operation. In a new project, code the Mapper for an application to count the number of passengers in each Titanic class in the input data set. Then develop a Reducer and Driver to generate final passenger counts in each Titanic class. Build the project by using Maven and run on Hadoop master node to check that output correctly shows passenger class numbers. Apply MapReduce to filter only surviving Titanic passengers from the input data set. Execute the application and verify that filtering has worked correctly; examine job and output files with YARN cluster manager and HDFS (Hadoop Distributed File System) NameNode web User interfaces. Using a restaurant app's data set, use MapReduce to obtain the distinct set of cuisines offered. Build and run the application and confirm output with HDFS from both command line and web application. The exercise involves filtering data by using MapReduce.

9 videos | 58m Assessment Badge

FREE ACCESS

COURSES INCLUDED

Ecosystem for Hadoop

Hadoop is a framework providing for distributed storage and processing of large data sets. Introduce yourself to a big data model and the Hadoop ecosystem.

8 videos | 32m Assessment Badge

Hadoop Design Principles

Hadoop's HDFS is a highly fault-tolerant distributed file system suitable for applications that have large data sets. Explore the principles of supercomputing and Hadoop's open source software components.

11 videos | 42m Assessment Badge

Selecting & Creating an Environment

Learn how to prepare your environment for a Hadoop installation. Here we review the minimum system requirements, create a development environment, install Java, and set up SSH for Hadoop.

4 videos | 26m Badge

Installation & Configuration

Once your environment is set up, you are ready to install Hadoop. Follow the step-by-step instructions for installing Hadoop in a pseudo-mode, and learn more about the Hadoop architecture.

9 videos | 1h 1m Assessment Badge

Configuration & Troubleshooting

After installation, there are tasks you need to perform before using Hadoop. Learn how to first use HDFS, WordCount, & Web UIs, perform configuration changes, and troubleshoot installation errors.

8 videos | 35m Assessment Badge

Data Repository with HDFS & HBase

It is vital you understand the Hadoop Distributed File System (HDFS). Explore the server architecture, and learn about the command line interface and common HDFS administration issues facing all end users.

13 videos | 1h Assessment Badge

HBase & ZooKeeper

Hadoop is all about big data. Explore the theory of HBase as another data repository built alongside or on top of HDFS. Also, learn how to install and configure HBase and ZooKeeper, and use the HBase command line.

7 videos | 50m Assessment Badge

Data Repository with Flume

Flume is tool for dealing with extraction and loading of unstructured data. Learn about the theory of Flume, its functional parts, and how to install Flume for use.

12 videos | 47m Assessment Badge

Timestamps, Sources, & Troubleshooting

Flume is tool for dealing with extraction and loading of unstructured data. Learn how to work with Flume sinks, sources, & agents, and how to troubleshoot Flume agents & failures.

12 videos | 47m Assessment Badge

Data Repository with Sqoop

Sqoop is a tool for transferring structured data between Hadoop and a RDBMS. Explore the architecture and installation of Sqoop, how to perform imports and exports, Hive SQL statements, and more.

16 videos | 1h 11m Badge

Data Refinery with YARN

YARN is a parallel processing framework that provides the resources for data computations. Explore the theory of parallel processing and the architecture of the YARN framework.

7 videos | 23m Assessment Badge

Data Refinery with MapReduce

MapReduce is a set of classes, which abstract away the complexity of parallel processing. Learn how MapReduce can take a single compute job and run it in our super computing platform.

13 videos | 54m Badge

Hive Joining, Partitioning, & Troubleshooting

Hive is a SQL-like tool for interfacing with Hadoop. Learn how to use Hive joins and views, partition Hive data, create Hive buckets, and troubleshoot errors.

10 videos | 40m Badge

Data Factory with Pig

Pig is a data flow language for interfacing with Hadoop to extract, transform, and load data. Learn how to install & configure Pig, and use the command line to write and execute Pig scripts.

12 videos | 47m Badge

Pig Functions & Troubleshooting

Pig is a data flow language for interfacing with Hadoop to extract, transform, and load data. Learn how to work with Pig joins, groups, & user-defined functions, and troubleshoot & debug with Pig.

8 videos | 47m Badge

HiveServer2 & HCatalog

Oozie is a workflow tool for coordinating other components in Hadoop. To use Oozie, a number of other components must be installed first. Learn the purpose of and how to install and configure the Hive metastore, HiveServer2, and HCatalog.

6 videos | 52m Badge

Data Factory with Oozie

Oozie is a workflow tool for coordinating other components of the Hadoop ecosystem. Learn how to install, configure, & use Oozie to create and run workflows.

10 videos | 55m Badge

Data Factory with Hue

Hue is an easy-to-use web UI to interface to HTFS, MapReduce, Hive, Pig, & Oozie. Learn how to install, configure, & use Hue to work with Hadoop components.

6 videos | 31m Assessment Badge

Data Flow for the Hadoop Ecosystem

Data must move into and through Hadoop for it to function. Here we look at Hadoop and the data life cycle management, and use Sqoop and Hive to flow data.

12 videos | 59m Badge

FREE ACCESS

COURSES INCLUDED

Designing Clusters

Hadoop is a framework providing fast and reliable analysis of large data sets. Introduce yourself to supercomputing, and explore the design principles of using Hadoop as a supercomputing platform.

6 videos | 32m Assessment Badge

Hadoop Cluster Architecture

Learn how to design a Hadoop cluster by taking an in-depth look at the hardware, network concepts, and the architecture that make up the cluster.

11 videos | 52m Assessment Badge

Hadoop in the Cloud

Amazon Web Services (AWS) is a secure cloud-computing platform offered by Amazon.com. Explore the key services offered by AWS and learn how to set up a Hadoop cluster.

16 videos | 1h 31m Assessment Badge

Data Migration & EMR

Discover how to use the AWS command line interface, examine AWS Elastic MapReduce (EMR), learn how to set up an EMR cluster, and explore the various ways to run EMR jobs.

10 videos | 1h 11m Badge

Cluster Deployment Tools & Images

To deploy a Hadoop Cluster, you must ensure networks, disks, and hosts are configured correctly. Examine the configuration management tools, learn how to create configuration items, and set up a CM environment.

6 videos | 47m Badge

Cluster Architecture Configuration

To deploy a Hadoop Cluster, you must ensure networks, disks, and hosts are configured correctly. Explore the Hadoop cluster architecture, learn how to start, stop, & configure Hadoop clusters, and configure logging & MySQL databases.

8 videos | 57m Assessment Badge

Cluster Deployment

To deploy a Hadoop Cluster, you must ensure networks, disks, and hosts are configured correctly. Learn how to set up of some of the common open-source software used to create and deploy a Hadoop ecosystem.

8 videos | 1h 4m Badge

Cluster Availability

Nothing is more important than having your Hadoop cluster available for use. Discover how Hadoop leverages fault tolerance, and explore a number of the reliability features that have been designed into Hadoop.

10 videos | 1h 6m Assessment Badge

Availability Configuration

To be useful, your Hadoop cluster must be available. Here we discuss and demonstrate high availability for HDFS NameNode and how to recover from failures.

6 videos | 51m Badge

Securing Clusters

Hadoop lets big data technologies reach companies, but as this grows so do the security concerns. Examine the risks and learn how to implement security groups and work with Kerberos.

8 videos | 1h 4m Assessment Badge

Securing with Kerberos

Hadoop lets big data technologies reach companies, but as this grows so do the security concerns. Examine the risks and learn how to implement HDFS, YARN, Hive, and other measures.

10 videos | 1h 11m Assessment Badge

Managing Security

Hadoop lets big data technologies reach companies, but as this grows so do the security concerns. Examine the risks and learn how to manage user security, access control lists, and other features.

9 videos | 1h 2m Assessment Badge

Operating Hadoop Clusters

Hadoop is a framework for running applications on large clusters of commodity hardware. Discover service levels, Hadoop releases, change management, and rack awareness.

5 videos | 36m Assessment Badge

Cluster Administration

Hadoop is a framework for running applications on large clusters of commodity hardware. Discover HDFS administration, quotas, DataNodes, HDFS scaling, and more.

10 videos | 1h 7m Badge

Stabilizing Clusters

Tuning Hadoop clusters is vital to improve cluster performance. Explore the importance of incident management and working with Nagios.

8 videos | 1h 24m Badge

Monitoring & Troubleshooting

Tuning Hadoop clusters is vital to improve cluster performance. Explore log management, problem management, and best practices for root cause analysis.

10 videos | 1h 8m Assessment Badge

Capacity Management Strategies

Apache Hadoop is an open-source software framework for storage and large-scale processing of datasets on clusters of commodity hardware. Explore capacity management of Hadoop clusters, including strategies and schedulers.

4 videos | 27m Assessment Badge

Capacity Management

Apache Hadoop is an open-source software framework for storage and large-scale processing of datasets on clusters of commodity hardware. Explore resource management through scheduling, the Fair Scheduler tool, and how to plan for scaling.

16 videos | 1h 38m Assessment Badge

Performance Tuning Best Practices

Hadoop can scale up from single servers to thousands of machines, each offering local computation and storage. Discover performance tuning concepts, including compression, tune up options, and memory optimization.

11 videos | 1h 14m Assessment Badge

Cluster Performance Tuning

Hadoop can scale up from single servers to thousands of machines, each offering local computation and storage. Examine tune up options, best practices for performance tuning, HDFS, YARN and MapReduce.

13 videos | 1h 17m Assessment Badge

Cloudera Manager & Hadoop Clusters

Cloudera Manager is a simple automated customizable management tool for Hadoop clusters. Explore web consoles for Cloudera Manager, cluster management tools, and cluster deployment.

6 videos | 55m Assessment Badge

Cloudera Manager Administration

Cloudera Manager is a simple automated customizable management tool for Hadoop clusters. Discover Cloudera Manager administration, including cluster management, services, and resource management.

7 videos | 1h 4m Badge

Cloudera Manager Tools & Configuration

Cloudera Manager is a simple automated customizable management tool for Hadoop clusters. Discover Cloudera Manager tools and configuration, including performance tweaking, Impala, Sentry, Hive, Hue with MySQL, and Oozie workflows.

12 videos | 1h 52m Badge

FREE ACCESS

COURSES INCLUDED

Managing Big Data Using HDInsight Hadoop

Explore the fundamentals of Azure HDInsight and the essential architectural components.

12 videos | 1h 5m Assessment Badge

Microsoft Analytics Platform System & Hive

Explore the Microsoft Analytics Platform System and using Hive to manage data from a data warehouse perspective.

17 videos | 1h 27m Assessment Badge

HDInsight & Retail Sales Implementation Using Hive

This course covers the implementation of data warehousing in retail sales. Learners will learn to design and implement data warehousing solutions using Hive and PowerBI on HDInsight.

11 videos | 45m Assessment Badge

Working with Spark Using HDInsight & Cluster Management

Discover how to work with Spark and its in-memory capabilities of data management. How to manage and troubleshoot HDInsight clusters using Ambari and the Azure CLI tool is also covered.

12 videos | 55m Assessment Badge

FREE ACCESS

COURSES INCLUDED

Hadoop HDFS File Permissions

Explore reasons why not all users should have free reign over all data sets, when managing a data warehouse. In this 9-video Skillsoft Aspire course, learners explore how file permissions can be viewed and configured in HDFS (Hadoop File Management System) and how the NameNode UI is used to monitor and explore HDFS. For this course, you need a good understanding of Hadoop and HDFS, along with familiarity with the HDFS shells, and confidence in working with and manipulating files on HDFS, and exploring it from the command line. The course focuses on different ways to view permissions, which are linked to files and directories, and how these can be modified. Learners explore automating many tasks involving HDFS by simply scripting them, and to use HDFS NameNode UI to monitor the distributed file system, and explore its contents. Review distributed computing and big data. The closing exercise involves writing a command to be used on the HDFS dfs shell to count the number of files within a directory on HDFS, and to perform related tasks.

9 videos | 48m Assessment Badge

Hadoop MapReduce Applications With Combiners

In this Skillsoft Aspire course, explore the use of Combiners to make MapReduce applications more efficient by minimizing data transfers. Start by learning about the need for Combiners to optimize the execution of a MapReduce application by minimizing data transfers within a cluster. Recall the steps to process data in a MapReduce application, and look at using a Combiner to perform partial reduction of data output from the Mapper. Then create a new project to calculate average automobile prices using Maven for a MapReduce application. Next, develop the Mapper and Reducer to calculate the average price for automobile makes in the input data set. Create a driver program for the MapReduce application, run it, and check output to get the average price per automobile. Learn how to code up a Combiner for a MapReduce application, fix the bug in the application so it can be used to correctly calculate the average price, then run the fixed application to verify that the prices are being calculated correctly. The concluding exercise concerns optimizing MapReduce with Combiners.

13 videos | 1h 23m Assessment Badge

Advanced Operations Using Hadoop MapReduce

In this Skillsoft Aspire course, explore how MapReduce can be used to extract the five most expensive vehicles in a data set, then build an inverted index for the words appearing in a set of text files. Begin by defining a vehicle type that can be used to represent automobiles to be stored in a Java PriorityQueue, then configure a Mapper to use a PriorityQueue to store the five most expensive automobiles it has processed from the dataset. Learn how to use a PriorityQueue in the Reducer of the application to receive the five most expensive automobiles from each mapper and write the top five automobiles overall to the output, then execute the application to verify the results. Next, explore how you can utilize the MapReduce framework in order to generate an inverted index and configure the Reducer and Driver for the inverted index application. This leads on to running the application and examining the inverted index on HDFS (Hadoop Distributed File System). The concluding exercise involves advanced operations using MapReduce.

9 videos | 48m Assessment Badge

FREE ACCESS

COURSES INCLUDED

Hadoop Distributed File System

Discover the HDFS architecture and its main building blocks. In addition, explore data replication, communication protocols, and accessibility.

11 videos | 32m Assessment Badge

Clusters

Clusters are used to store and analyze large volumes of data in a distributed computer environment. Explore the best practices to follow when implementing clusters in Hadoop.

8 videos | 48m Assessment Badge

Hadoop on Amazon EMR

Hadoop can be used with Amazon EMR to process vast amounts of data. Explore how to use Hadoop with Amazon EMR.

10 videos | 47m Assessment Badge

Hadoop Ranger

Apache Ranger is used to provide data security across a Hadoop implementation. Explore the installation of Ranger and Ranger authentication considerations, as well as customizing services to run Ranger alongside Hadoop.

9 videos | 51m Assessment Badge

Maintenance & Distributions

Distributions provide performance and functionality enhancements over the base open source code Apache provides. Explore the various distributions available and common maintenance tasks in a Hadoop environment.

10 videos | 37m Assessment Badge

FREE ACCESS

EARN A DIGITAL BADGE WHEN YOU COMPLETE THESE COURSES

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.

Digital badges are yours to keep, forever.

BOOKS INCLUDED

Book

Professional Hadoop

Serving as the complete reference and resource for experienced developers looking to employ Apache Hadoop in real-world settings, this guide details every aspect of Hadoop technology to enable optimal processing of large data sets, and gets you acquainted with the framework's processes and capabilities right away.

3h 47m By Benoy Antony, et al.

Book

Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools

From setting up the environment to running sample applications, this step-by-step resource is a practical tutorial on using the Apache Hadoop ecosystem project.

4h 56m By Deepak Vohra

Book

Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset

Approaching the problem of managing massive data sets from a systems perspective, this book explains the roles for each project (like architect and tester, for example) and shows how the Hadoop toolset can be used at each system stage - and then explains, in an easily understood manner and through numerous examples, how to use each tool.

5h 27m By Michael Frampton

Book

Pro Hadoop

Written from the perspective of a principal engineer with down-in-the-trenches knowledge of what to do wrong with Hadoop, this book shows how to avoid the common, expensive first errors that everyone makes with creating their own Hadoop system.

7h By Jason Venner

Book

Hadoop Architecture and SQL: The Best HiveQL Book in the Universe

Including hundreds of pages of SQL examples and explanations, this book is perfect for anyone who wants to query Hadoop with SQL and educates readers on how to create tables, how the data is distributed, and how the system processes the data.

1h 32m By Jason Nolander, Tom Coffing

Book

Hadoop for Dummies

Showing you how to harness the power of your data and rein in the information overload, this detailed guide will help you understand the value of big data, make a business case for using Hadoop, navigate the Hadoop ecosystem, and build and manage Hadoop applications and clusters.

6h 39m By Dirk deRoos, et al.

Book

Processing Big Data with Azure HDInsight: Building Real-World Big Data Systems on Azure HDInsight Using the Hadoop Ecosystem

As most Hadoop and Big Data projects are written in either Java, Scala, or Python, this book minimizes the effort to learn another language and is written from the perspective of a .NET developer.

3h 4m By Vinit Yadav

FREE ACCESS

BOOKS INCLUDED

Book

Big Data and Hadoop: Learn by Example

Containing the latest trends in big data and Hadoop, this learn-by-doing resource explains how big Big Data is and why everybody is trying to implement it into their IT projects.

4h 17m By Mayank Bhushan

Book

Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools

From setting up the environment to running sample applications, this step-by-step resource is a practical tutorial on using the Apache Hadoop ecosystem project.

4h 56m By Deepak Vohra

Book

Professional Hadoop

3h 47m By Benoy Antony, et al.

Book

Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset

5h 27m By Michael Frampton

FREE ACCESS

BOOKS INCLUDED

Book

Professional Hadoop

3h 47m By Benoy Antony, et al.

Book

Big Data and Hadoop: Learn by Example

Containing the latest trends in big data and Hadoop, this learn-by-doing resource explains how big Big Data is and why everybody is trying to implement it into their IT projects.

4h 17m By Mayank Bhushan

Book

Pro Apache Hadoop, Second Edition

Taking you quickly to the seasoned pro level on the hottest cloud-computing framework, this book covers everything you need to build your first Hadoop cluster and begin analyzing and deriving value from your business and scientific data.

7h 26m By Jason Venner, Madhu Siddalingaiah, Sameer Wadkar

Book

Hadoop for Dummies

6h 39m By Dirk deRoos, et al.

Book

Pro Hadoop Data Analytics: Designing and Building Big Data Systems using the Hadoop Ecosystem

Emphasizing best practices to ensure coherent, efficient development, this book provides the right combination of architecture, design, and implementation information to create analytical systems that go beyond the basics of classification, clustering, and recommendation.

3h 4m By Kerry Koitzsch

Book

Professional Hadoop Solutions

With in-depth code examples in Java and XML and the latest on recent additions to the Hadoop ecosystem, this complete resource also covers the use of APIs, exposing their inner workings and allowing architects and developers to better leverage and customize them.

8h 2m By Alexey Yakubovich, Boris Lublinsky, Kevin T. Smith

Book

Big Data Processing Beyond Hadoop and MapReduce

Authored by EMC Proven Professionals, Knowledge Sharing articles present ideas, expertise, unique deployments, and best practices. This article provides an overview of various new and upcoming alternatives to Hadoop MR.

23m By Ravi Sharda

Book

Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset

5h 27m By Michael Frampton

FREE ACCESS

BOOKS INCLUDED

Book

Professional Hadoop

3h 47m By Benoy Antony, et al.

Book

Big Data and Hadoop: Learn by Example

Containing the latest trends in big data and Hadoop, this learn-by-doing resource explains how big Big Data is and why everybody is trying to implement it into their IT projects.

4h 17m By Mayank Bhushan

Book

Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset

5h 27m By Michael Frampton

Book

Practical Hadoop Security

For administrators planning a production Hadoop deployment who want to secure their Hadoop clusters, this resource takes you through a comprehensive study of how to implement defined security within a Hadoop cluster in a hands-on way.

3h 40m By Bhushan Lakhe

Book

Professional Hadoop Solutions

8h 2m By Alexey Yakubovich, Boris Lublinsky, Kevin T. Smith

Book

Pro Hadoop Data Analytics: Designing and Building Big Data Systems using the Hadoop Ecosystem

3h 4m By Kerry Koitzsch

Book

Hadoop for Dummies

6h 39m By Dirk deRoos, et al.

Book

Pro Apache Hadoop, Second Edition

7h 26m By Jason Venner, Madhu Siddalingaiah, Sameer Wadkar

FREE ACCESS

BOOKS INCLUDED

Book

Processing Big Data with Azure HDInsight: Building Real-World Big Data Systems on Azure HDInsight Using the Hadoop Ecosystem

As most Hadoop and Big Data projects are written in either Java, Scala, or Python, this book minimizes the effort to learn another language and is written from the perspective of a .NET developer.

3h 4m By Vinit Yadav

FREE ACCESS

BOOKS INCLUDED

Book

Big Data and Hadoop: Learn by Example

Containing the latest trends in big data and Hadoop, this learn-by-doing resource explains how big Big Data is and why everybody is trying to implement it into their IT projects.

4h 17m By Mayank Bhushan

Book

Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools

From setting up the environment to running sample applications, this step-by-step resource is a practical tutorial on using the Apache Hadoop ecosystem project.

4h 56m By Deepak Vohra

Book

Professional Hadoop

3h 47m By Benoy Antony, et al.

Book

Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset

5h 27m By Michael Frampton

FREE ACCESS

BOOKS INCLUDED

Book

Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools

From setting up the environment to running sample applications, this step-by-step resource is a practical tutorial on using the Apache Hadoop ecosystem project.

4h 56m By Deepak Vohra

Book

Pro Hadoop Data Analytics: Designing and Building Big Data Systems using the Hadoop Ecosystem

3h 4m By Kerry Koitzsch

Book

Professional Hadoop

3h 47m By Benoy Antony, et al.

Book

Practical Hive: A Guide to Hadoop's Data Warehouse System

From deploying Hive on your hardware or virtual machine and setting up its initial configuration to learning how Hive interacts with Hadoop, MapReduce, Tez and other big data technologies, this go-to resource gives you a detailed treatment of the software.

3h 57m By Andreas François Vermeulen, Ankur Gupta, David Kjerrumgaard, Scott Shaw

Book

Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset

5h 27m By Michael Frampton

Book

Hadoop Architecture and SQL: The Best HiveQL Book in the Universe

1h 32m By Jason Nolander, Tom Coffing

Book

Pro Apache Hadoop, Second Edition

7h 26m By Jason Venner, Madhu Siddalingaiah, Sameer Wadkar

Book

Hadoop for Dummies

6h 39m By Dirk deRoos, et al.

FREE ACCESS

Book Big Data and Hadoop: Fundamentals, Tools, and Techniques for Data-Driven Success, 2nd Edition

Course Hadoop & MapReduce Getting Started

(54)

Channel Wintellect Apache Hadoop

Get Started

Sharpen your skills. Upgrade your career. Find the right learning path for you, based on your role and skills. Take part in hands-on practice, study for a certification, and much more - all personalized for you.

*Not included: Compliance, Leadership Development Program content, and Engineering books

Your content + our content + our platform = a path to learning success

Using our learning experience platform, Percipio, your learners can engage in custom learning paths that can feature curated content from all sources.

Learn More

Aspire to something bigger

Aspire Journeys are guided learning paths that set you in motion for career success.

Browse Aspire Journeys

Explore a world of live learning with Global Knowledge

Choose from convenient delivery formats to get the training you and your team need - where, when and how you want it.

Browse Live Learning

IT Skills and Salary Report

ESG Impact Report

Apache Hadoop: Apache Hadoop 1.0 intermediate

GETTING STARTED

GETTING STARTED

GETTING STARTED

GETTING STARTED

GETTING STARTED

GETTING STARTED

GETTING STARTED

COURSES INCLUDED

COURSES INCLUDED

COURSES INCLUDED

COURSES INCLUDED

COURSES INCLUDED

COURSES INCLUDED

COURSES INCLUDED

EARN A DIGITAL BADGE WHEN YOU COMPLETE THESE COURSES

BOOKS INCLUDED

BOOKS INCLUDED

BOOKS INCLUDED

BOOKS INCLUDED

BOOKS INCLUDED

BOOKS INCLUDED

BOOKS INCLUDED

YOU MIGHT ALSO LIKE