Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools

  • 4h 56m
  • Deepak Vohra
  • Apress
  • 2016

This book is a practical guide on using the Apache Hadoop projects including MapReduce, HDFS, Apache Hive, Apache HBase, Apache Kafka, Apache Mahout and Apache Solr. From setting up the environment to running sample applications each chapter is a practical tutorial on using a Apache Hadoop ecosystem project. While several books on Apache Hadoop are available, most are based on the main projects MapReduce and HDFS and none discusses the other Apache Hadoop ecosystem projects and how these all work together as a cohesive big data development platform.

What you'll learn

  • How to set up environment in Linux for Hadoop projects using Cloudera Hadoop Distribution CDH 5.
  • How to run a MapReduce job
  • How to store data with Apache Hive, Apache HBase
  • How to index data in HDFS with Apache Solr
  • How to develop a Kafka messaging system
  • How to develop a Mahout User Recommender System
  • How to stream Logs to HDFS with Apache Flume
  • How to transfer data from MySQL database to Hive, HDFS and HBase with Sqoop
  • How create a Hive table over Apache Solr

Who this book is for:

The primary audience is Apache Hadoop developers. Pre-requisite knowledge of Linux and some knowledge of Hadoop is required.

About the Author

Deepak Vohra is a consultant and a principal member of the NuBean.com software company. Vohra is a Sun-certified Java programmer and web component developer. He has worked in the fields of XML, Java programming, and Java EE for over seven years. Vohra is the coauthor of Pro XML Development with Java Technology (Apress, 2006). He is also the author of the JDBC 4.0 and Oracle JDeveloper for J2EE Development, Processing XML Documents with Oracle JDeveloper 11g, EJB 3.0 Database Persistence with Oracle Fusion Middleware 11g, and Java EE Development in Eclipse IDE (Packt Publishing). He also served as the technical reviewer on WebLogic: The Definitive Guide (O'Reilly Media, 2004) and Ruby Programming for the Absolute Beginner (Cengage Learning PTR, 2007).

In this Book

  • Introduction
  • HDFS and MapReduce
  • Apache Hive
  • Apache HBase
  • Apache Sqoop
  • Apache Flume
  • Apache Avro
  • Apache Parquet
  • Apache Kafka
  • Apache Solr
  • Apache Mahout
SHOW MORE
FREE ACCESS