Processing Data: Introducing Apache Spark

Apache Spark | Intermediate

13 videos | 1h 44m 10s
Includes Assessment
Earns a Badge

(57)

Apache Spark is a powerful distributed data processing engine that can handle petabytes of data by chunking that data and dividing across a cluster of resources. In this course, explore Spark's structured streaming engine, including components like PySpark shell. Begin by downloading and installing Apache Spark. Then create a Spark cluster and run a job from the PySpark shell. Monitor an application and job runs from the Spark web user interface. Then, set up a streaming environment, reading and manipulating the contents of files that are added to a folder in real-time. Finally, run apps on both Spark standalone and local modes.

WHAT YOU WILL LEARN

Discover the key concepts covered in this course

Describe how apache hadoop and spark work

Recall the architecture and features of apache spark

Recognize the use cases of spark in general and specifically, its structured streaming engine

Install and configure apache spark

Create a spark cluster with a master and worker

Run a job on the pyspark shell and view its details from the spark web user interface (ui)
Execute spark commands and monitor jobs with the spark web ui

Configure a spark cluster using the spark-env.sh file

Set up an environment to stream files, and build an app to process files in real-time

Execute apps on a spark standalone cluster

Distinguish between spark standalone and local deployment modes

Summarize the key concepts covered in this course

IN THIS COURSE

1m 23s

FREE ACCESS
12m 30s

FREE ACCESS
3. Apache Spark Architecture

13m 7s

FREE ACCESS
4. Structured Streaming in Apache Spark

8m 11s

FREE ACCESS
5. Downloading and Installing Spark

6m 50s

FREE ACCESS
6. Deploying a Spark Cluster

9m 53s

FREE ACCESS
7. Launching a Spark Job

11m 9s

FREE ACCESS
8. Monitoring Spark Apps with the Web UI

7m 31s

FREE ACCESS
9. Configuring a Spark Cluster

6m 33s

FREE ACCESS
10. Building a Spark Streaming App

9m 50s

FREE ACCESS
11. Running Apps on a Standalone Cluster

8m 29s

FREE ACCESS
12. Running Apps on Spark Local

6m 14s

FREE ACCESS
13. Course Summary

2m 30s

FREE ACCESS

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.

Digital badges are yours to keep, forever.

Book Spark in Action, Second Edition

Channel Apache Kafka

(1)

Journey Data Infrastructure with Apache Kafka

(3)

PEOPLE WHO VIEWED THIS ALSO VIEWED THESE

Course Data Analysis Using the Spark DataFrame API

(31)

Course Apache Spark Getting Started

(137)

Course Spark Monitoring & Tuning

(1)

Get Started

Sharpen your skills. Upgrade your career. Find the right learning path for you, based on your role and skills. Take part in hands-on practice, study for a certification, and much more - all personalized for you.

*Not included: Compliance, Leadership Development Program content, and Engineering books

Your content + our content + our platform = a path to learning success

Using our learning experience platform, Percipio, your learners can engage in custom learning paths that can feature curated content from all sources.

Learn More

Aspire to something bigger

Aspire Journeys are guided learning paths that set you in motion for career success.

Browse Aspire Journeys

Explore a world of live learning with Global Knowledge

Choose from convenient delivery formats to get the training you and your team need - where, when and how you want it.

Browse Live Learning

IT Skills and Salary Report

ESG Impact Report

Processing Data: Introducing Apache Spark

WHAT YOU WILL LEARN

IN THIS COURSE

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

YOU MIGHT ALSO LIKE

PEOPLE WHO VIEWED THIS ALSO VIEWED THESE