Apache Spark: Apache Spark 3.2 intermediate

https://www.skillsoft.com/channel/apache-spark-d0c511f0-0b1e-11e7-a3e9-a39d8b265364?technologyandversion=308718&expertiselevel=308715 https://www.skillsoft.com/channel/apache-spark-d0c511f0-0b1e-11e7-a3e9-a39d8b265364?technologyandversion=14116106&expertiselevel=308716 https://www.skillsoft.com/channel/apache-spark-d0c511f0-0b1e-11e7-a3e9-a39d8b265364?technologyandversion=308717&expertiselevel=308716 https://www.skillsoft.com/channel/apache-spark-d0c511f0-0b1e-11e7-a3e9-a39d8b265364?technologyandversion=308719&expertiselevel=308716

3 Courses | 3h 11m 7s
8 Books | 30h 58m

1 Course | 1h 51m 32s
2 Books | 9h 5m

5 Courses | 4h 19m 12s
5 Books | 19h 45m

2 Courses | 1h 13m 17s

(1)

Explore Apache Spark, the open-source cluster computing framework that provides a fault-tolerant programming interface for clusters.

GETTING STARTED

Apache Spark Getting Started

2m 20s
5m 17s

+13 MORE VIDEOS | FREE ACCESS

GETTING STARTED

Graph Modeling on Apache Spark: Working with Apache Spark GraphFrames

2m 23s
11m 51s

+11 MORE VIDEOS | FREE ACCESS

GETTING STARTED

Introduction to Apache Spark

6m 21s
7m 40s

+8 MORE VIDEOS | FREE ACCESS

GETTING STARTED

Introducing Apache Spark for AI Development

1m 51s
2m 26s

+13 MORE VIDEOS | FREE ACCESS

COURSES INCLUDED

Apache Spark Getting Started

Explore the basics of Apache Spark, an analytics engine used for big data processing. It's an open source, cluster computing framework built on top of Hadoop. Discover how it allows operations on data with both its own library methods and with SQL, while delivering great performance. Learn the characteristics, components, and functions of Spark, Hadoop, RDDS, the spark session, and master and worker notes. Install PySpark. Then, initialize a Spark Context and Spark DataFrame from the contents of an RDD and a DataFrame. Configure a DataFrame with a map function. Retrieve and transform data. Finally, convert Spark and Pandas DataFrames and vice versa.

15 videos | 1h 6m Assessment Badge

Data Analysis Using the Spark DataFrame API

An open-source cluster-computing framework used for data science, Apache Spark has become the de facto big data framework. In this Skillsoft Aspire course, learners explore how to analyze real data sets by using DataFrame API methods. Discover how to optimize operations with shared variables and combine data from multiple DataFrames using joins. Explore the Spark 2.x version features that make it significantly faster than Spark 1.x. Other topics include how to create a Spark DataFrame from a CSV file; apply DataFrame transformations, grouping, and aggregation; perform operations on a DataFrame to analyze categories of data in a data set. Visualize the contents of a Spark DataFrame, with Matplotlib. Conclude by studying how to broadcast variables and DataFrame contents in text file format.

16 videos | 1h 10m Assessment Badge

Data Analysis using Spark SQL

Analyze an Apache Spark DataFrame as though it were a relational database table. During this Aspire course, you will discover the different stages involved in optimizing any query or method call on the contents of a Spark DataFrame. Discover how to create views out of a Spark DataFrame's contents and run queries against them; and how to trim and clean a DataFrame. Next, learn how to perform an analysis of data by running different SQL queries; how to configure a DataFrame with an explicitly defined schema; and define what a window is in the context of Spark. Finally, observe how to create and analyze categories of data in a data set by using Windows.

9 videos | 54m Assessment Badge

FREE ACCESS

COURSES INCLUDED

Graph Modeling on Apache Spark: Working with Apache Spark GraphFrames

Apache Spark, which is a widely used analytics engine, also helps anyone modeling graphs to perform powerful graph analytics. GraphFrames, a Spark package, aids this process by providing various graph algorithm implementations. Use this course to learn about GraphFrames and the application of graph algorithms on data to extract insights. Explore how GraphFrames complements the Apache Hadoop ecosystem in processing graph data. Getting hands-on, construct and visualize a GraphFrame. Practice querying nodes and relationships in a graph and finding motifs in it. Moving along, work with the breadth-first search and the shortestPaths functions to find paths between graph nodes. And finally, apply the PageRank algorithm to arrive at the most relevant nodes in a network. Upon completion, you'll be able to use GraphFrames to analyze and generate insights from graph data.

13 videos | 1h 51m Assessment Badge

FREE ACCESS

COURSES INCLUDED

Introduction to Apache Spark

Apache Spark is an open-source big data processing framework. Explore how to download and install Apache Spark, and also build, configure, and initialize Spark.

10 videos | 54m Assessment Badge

Apache Spark SQL

Apache Spark SQL is used for structured data processing in Spark. Explore features of Spark SQL such as SparkSessions, DataFrames, and Datasets.

16 videos | 1h Assessment Badge

Structured Streaming

Discover the concepts of Structured Steaming such as Windowing, DataFrame, and SQL Operations, and explore File Sinks, Deduplication, and Checkpointing.

12 videos | 1h 5m Assessment Badge

Spark Monitoring & Tuning

Explore various ways to monitor Spark applications such as web UIs, metrics, and other monitoring tools, and examine memory tuning.

14 videos | 49m Assessment Badge

Spark Security

Discover Spark security! Explore how to secure Spark UI, event logs, and configuring SSL settings, and examin YARN deployments, SASL encryption, and network security.

8 videos | 29m Assessment Badge

FREE ACCESS

COURSES INCLUDED

Introducing Apache Spark for AI Development

Apache Spark provides a robust framework for implementing machine learning and deep learning. It takes advantage of resilient distributed databases to provide a fault-tolerant platform well-suited to developing big data applications. Because many large companies are actively using this framework, AI developers should be familiar with the basics of implementing AI with Apache Spark and Spark ML. In this course, you'll explore the concept of distributed computing. You'll identify the benefits of using Spark for AI Development, examining the advantages and disadvantages of using Spark over other big data AI platforms. Next, you'll describe how to implement machine learning, deep learning, natural language processing, and computer vision using Spark. Finally, you'll use Spark ML to create a movie recommendation system commonly used by Netflix and YouTube.

15 videos | 36m Assessment Badge

Using Apache Spark for AI Development

Spark is a leading open-source cluster-computing framework that is used for distributed databases and machine learning. Although not primarily designed for AI, Spark allows you to take advantage of data parallelism and the large distributed systems used in AI development. AI practitioners should recognize when to use Spark for a particular application. In this course, you'll explore advanced techniques for working with Apache Spark and identify the key advantages of using Spark over other platforms. You'll define the meaning of resilient distributed databases (RDDs) and explore several workflows related to them. You'll move on to recognize how to work with a Spark DataFrame, identifying its features and use cases. Finally, you'll learn how to create a machine learning pipeline using Spark ML Pipelines.

13 videos | 36m Assessment Badge

FREE ACCESS

EARN A DIGITAL BADGE WHEN YOU COMPLETE THESE COURSES

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.

Digital badges are yours to keep, forever.

BOOKS INCLUDED

Book

Beginning Apache Spark 2: With Resilient Distributed Datasets, Spark SQL, Structured Streaming and Spark Machine Learning Library

A tutorial on the Apache Spark platform written by an expert engineer and trainer, this book will give you the fundamentals to become proficient in using Apache Spark and know when and how to apply it to your big data applications.

5h 7m By Hien Luu

Book

Practical Apache Spark: Using the Scala API

Following a learn-to-do-by-yourself approach to teaching Apache Spark using Scala, this book will help you learn the concepts, practice the code snippets in Scala, and complete the assignments given to get an overall exposure.

1h 53m By Dharanitharan Ganesan, Subhashini Chellappan

Book

Next-Generation Big Data: A Practical Guide to Apache Kudu, Impala, and Spark

Utilize this practical and easy-to-follow guide to modernize traditional enterprise data warehouse and business intelligence environments with next-generation big data technologies.

4h 13m By Butch Quinto

Book

PySpark Recipes: A Problem-Solution Approach with PySpark2

Taking you on an interesting journey to learn about PySpark and big data, this book uses a problem-solution approach where every problem is followed by a detailed, step-by-step answer which will improve your thought process for solving big data problems with PySpark.

3h 2m By Raju Kumar Mishra

Book

Big Data SMACK: A Guide to Apache Spark, Mesos, Akka, Cassandra, and Kafka

Explaining each of the full-stack technologies and, more importantly, how to best integrate them, this book provides detailed coverage of the practical benefits of these technologies and incorporates real-world examples in every situation.

3h 56m By Isaac Ruiz, Raul Estrada

Book

Pro Spark Streaming: The Zen of Real-Time Analytics Using Apache Spark

Introducing use cases in each chapter from a specific industry, and using publicly available datasets from that domain to unravel the intricacies of production-grade design and implementation, this book walks you through end-to-end real-time application development using real-world applications, data, and code.

4h 16m By Zubair Nabi

Book

Spark: Big Data Cluster Computing in Production

With real-world production insight and expert guidance, tips, and tricks, this incredibly useful resource goes beyond general Spark overviews to provide targeted guidance toward using lightning-fast big data clustering in production.

3h 35m By Brennon York, Ema Orhian, Ilya Ganelin, Kai Sasaki

Book

Big Data Analytics with Spark: A Practitioner's Guide to Using Spark for Large-Scale Data Processing, Machine Learning, and Graph Analytics, and High-Velocity Data Stream Processing

Helping you become a much sought-after Spark expert, this step-by-step guide shows you how to use Spark for different types of big data analytics projects, including batch, interactive, graph, and stream data analysis as well as machine learning.

4h 56m By Mohammed Guller

FREE ACCESS

BOOKS INCLUDED

Book

Beginning Apache Spark 3

This book begins by explaining different ways of interacting with Apache Spark, such as Spark Concepts and Architecture, and Spark Unified Stack.

5h 19m By Hien Luu

Book

Hands-on Guide to Apache Spark 3: Build Scalable Computing Engines for Batch and Stream Data Processing

This book explains how to scale Apache Spark 3 to handle massive amounts of data, either via batch or streaming processing. It covers how to use Spark's structured APIs to perform complex data transformations and analyses you can use to implement end-to-end analytics workflows.

3h 46m By Alfonso Antolínez García

FREE ACCESS

BOOKS INCLUDED

Book

Big Data SMACK: A Guide to Apache Spark, Mesos, Akka, Cassandra, and Kafka

3h 56m By Isaac Ruiz, Raul Estrada

Book

Pro Spark Streaming: The Zen of Real-Time Analytics Using Apache Spark

4h 16m By Zubair Nabi

Book

Big Data Analytics with Spark: A Practitioner's Guide to Using Spark for Large-Scale Data Processing, Machine Learning, and Graph Analytics, and High-Velocity Data Stream Processing

4h 56m By Mohammed Guller

Book

Spark: Big Data Cluster Computing in Production

3h 35m By Brennon York, Ema Orhian, Ilya Ganelin, Kai Sasaki

Book

PySpark Recipes: A Problem-Solution Approach with PySpark2

3h 2m By Raju Kumar Mishra

FREE ACCESS

SKILL BENCHMARKS INCLUDED

Apache Spark Competency (Intermediate Level)

The Apache Spark Competency (Intermediate Level) benchmark measures your knowledge of deploying, using and streaming with Apache Spark. You will be evaluated on Spark clusters, jobs, streaming, and transforming with Spark SQL. Learners scoring high on this benchmark demonstrate the skills necessary to using Apache Spark in thr data streaming applications.

20m | 15 questions

FREE ACCESS

Channel Apache Kafka

(1)

Channel Apache HBase

(1)

Channel Interskill PL/1 Programming

Get Started

Sharpen your skills. Upgrade your career. Find the right learning path for you, based on your role and skills. Take part in hands-on practice, study for a certification, and much more - all personalized for you.

*Not included: Compliance, Leadership Development Program content, and Engineering books

Your content + our content + our platform = a path to learning success

Using our learning experience platform, Percipio, your learners can engage in custom learning paths that can feature curated content from all sources.

Learn More

Aspire to something bigger

Aspire Journeys are guided learning paths that set you in motion for career success.

Browse Aspire Journeys

Explore a world of live learning with Global Knowledge

Choose from convenient delivery formats to get the training you and your team need - where, when and how you want it.

Browse Live Learning

IT Skills and Salary Report

ESG Impact Report

Apache Spark: Apache Spark 3.2 intermediate

GETTING STARTED

GETTING STARTED

GETTING STARTED

GETTING STARTED

COURSES INCLUDED

COURSES INCLUDED

COURSES INCLUDED

COURSES INCLUDED

EARN A DIGITAL BADGE WHEN YOU COMPLETE THESE COURSES

BOOKS INCLUDED

BOOKS INCLUDED

BOOKS INCLUDED

SKILL BENCHMARKS INCLUDED

YOU MIGHT ALSO LIKE