Processing Data: Introducing Apache Spark
Apache Spark
| Intermediate
- 13 videos | 1h 44m 10s
- Includes Assessment
- Earns a Badge
Apache Spark is a powerful distributed data processing engine that can handle petabytes of data by chunking that data and dividing across a cluster of resources. In this course, explore Spark's structured streaming engine, including components like PySpark shell. Begin by downloading and installing Apache Spark. Then create a Spark cluster and run a job from the PySpark shell. Monitor an application and job runs from the Spark web user interface. Then, set up a streaming environment, reading and manipulating the contents of files that are added to a folder in real-time. Finally, run apps on both Spark standalone and local modes.
WHAT YOU WILL LEARN
-
Discover the key concepts covered in this courseDescribe how apache hadoop and spark workRecall the architecture and features of apache sparkRecognize the use cases of spark in general and specifically, its structured streaming engineInstall and configure apache sparkCreate a spark cluster with a master and workerRun a job on the pyspark shell and view its details from the spark web user interface (ui)
-
Execute spark commands and monitor jobs with the spark web uiConfigure a spark cluster using the spark-env.sh fileSet up an environment to stream files, and build an app to process files in real-timeExecute apps on a spark standalone clusterDistinguish between spark standalone and local deployment modesSummarize the key concepts covered in this course
IN THIS COURSE
-
1m 23s
-
12m 30s
-
13m 7s
-
8m 11s
-
6m 50s
-
9m 53s
-
11m 9s
-
7m 31s
-
6m 33s
-
9m 50s
-
8m 29s
-
6m 14s
-
2m 30s
EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE
Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.
Digital badges are yours to keep, forever.