Apache Spark: Apache Spark 2.4 intermediate
Technology:
Expertise:
- 3 Courses | 3h 11m 7s
- 8 Books | 30h 58m
- 1 Course | 1h 51m 32s
- 2 Books | 9h 5m
- 5 Courses | 4h 19m 12s
- 5 Books | 19h 45m
- 2 Courses | 1h 13m 17s
Explore Apache Spark, the open-source cluster computing framework that provides a fault-tolerant programming interface for clusters.
GETTING STARTED
Graph Modeling on Apache Spark: Working with Apache Spark GraphFrames
-
2m 23s
-
11m 51s
GETTING STARTED
Introducing Apache Spark for AI Development
-
1m 51s
-
2m 26s
COURSES INCLUDED
Apache Spark Getting Started
Explore the basics of Apache Spark, an analytics engine used for big data processing. It's an open source, cluster computing framework built on top of Hadoop. Discover how it allows operations on data with both its own library methods and with SQL, while delivering great performance. Learn the characteristics, components, and functions of Spark, Hadoop, RDDS, the spark session, and master and worker notes. Install PySpark. Then, initialize a Spark Context and Spark DataFrame from the contents of an RDD and a DataFrame. Configure a DataFrame with a map function. Retrieve and transform data. Finally, convert Spark and Pandas DataFrames and vice versa.
15 videos |
1h 6m
Assessment
Badge
Data Analysis Using the Spark DataFrame API
An open-source cluster-computing framework used for data science, Apache Spark has become the de facto big data framework. In this Skillsoft Aspire course, learners explore how to analyze real data sets by using DataFrame API methods. Discover how to optimize operations with shared variables and combine data from multiple DataFrames using joins. Explore the Spark 2.x version features that make it significantly faster than Spark 1.x. Other topics include how to create a Spark DataFrame from a CSV file; apply DataFrame transformations, grouping, and aggregation; perform operations on a DataFrame to analyze categories of data in a data set. Visualize the contents of a Spark DataFrame, with Matplotlib. Conclude by studying how to broadcast variables and DataFrame contents in text file format.
16 videos |
1h 10m
Assessment
Badge
Data Analysis using Spark SQL
Analyze an Apache Spark DataFrame as though it were a relational database table. During this Aspire course, you will discover the different stages involved in optimizing any query or method call on the contents of a Spark DataFrame. Discover how to create views out of a Spark DataFrame's contents and run queries against them; and how to trim and clean a DataFrame. Next, learn how to perform an analysis of data by running different SQL queries; how to configure a DataFrame with an explicitly defined schema; and define what a window is in the context of Spark. Finally, observe how to create and analyze categories of data in a data set by using Windows.
9 videos |
54m
Assessment
Badge
COURSES INCLUDED
Graph Modeling on Apache Spark: Working with Apache Spark GraphFrames
Apache Spark, which is a widely used analytics engine, also helps anyone modeling graphs to perform powerful graph analytics. GraphFrames, a Spark package, aids this process by providing various graph algorithm implementations. Use this course to learn about GraphFrames and the application of graph algorithms on data to extract insights. Explore how GraphFrames complements the Apache Hadoop ecosystem in processing graph data. Getting hands-on, construct and visualize a GraphFrame. Practice querying nodes and relationships in a graph and finding motifs in it. Moving along, work with the breadth-first search and the shortestPaths functions to find paths between graph nodes. And finally, apply the PageRank algorithm to arrive at the most relevant nodes in a network. Upon completion, you'll be able to use GraphFrames to analyze and generate insights from graph data.
13 videos |
1h 51m
Assessment
Badge
COURSES INCLUDED
Introduction to Apache Spark
Apache Spark is an open-source big data processing framework. Explore how to download and install Apache Spark, and also build, configure, and initialize Spark.
10 videos |
54m
Assessment
Badge
Apache Spark SQL
Apache Spark SQL is used for structured data processing in Spark. Explore features of Spark SQL such as SparkSessions, DataFrames, and Datasets.
16 videos |
1h
Assessment
Badge
Structured Streaming
Discover the concepts of Structured Steaming such as Windowing, DataFrame, and SQL Operations, and explore File Sinks, Deduplication, and Checkpointing.
12 videos |
1h 5m
Assessment
Badge
Spark Monitoring & Tuning
Explore various ways to monitor Spark applications such as web UIs, metrics, and other monitoring tools, and examine memory tuning.
14 videos |
49m
Assessment
Badge
Spark Security
Discover Spark security! Explore how to secure Spark UI, event logs, and configuring SSL settings, and examin YARN deployments, SASL encryption, and network security.
8 videos |
29m
Assessment
Badge
SHOW MORE
FREE ACCESS
COURSES INCLUDED
Introducing Apache Spark for AI Development
Apache Spark provides a robust framework for implementing machine learning and deep learning. It takes advantage of resilient distributed databases to provide a fault-tolerant platform well-suited to developing big data applications. Because many large companies are actively using this framework, AI developers should be familiar with the basics of implementing AI with Apache Spark and Spark ML. In this course, you'll explore the concept of distributed computing. You'll identify the benefits of using Spark for AI Development, examining the advantages and disadvantages of using Spark over other big data AI platforms. Next, you'll describe how to implement machine learning, deep learning, natural language processing, and computer vision using Spark. Finally, you'll use Spark ML to create a movie recommendation system commonly used by Netflix and YouTube.
15 videos |
36m
Assessment
Badge
Using Apache Spark for AI Development
Spark is a leading open-source cluster-computing framework that is used for distributed databases and machine learning. Although not primarily designed for AI, Spark allows you to take advantage of data parallelism and the large distributed systems used in AI development. AI practitioners should recognize when to use Spark for a particular application. In this course, you'll explore advanced techniques for working with Apache Spark and identify the key advantages of using Spark over other platforms. You'll define the meaning of resilient distributed databases (RDDs) and explore several workflows related to them. You'll move on to recognize how to work with a Spark DataFrame, identifying its features and use cases. Finally, you'll learn how to create a machine learning pipeline using Spark ML Pipelines.
13 videos |
36m
Assessment
Badge
EARN A DIGITAL BADGE WHEN YOU COMPLETE THESE COURSES
Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.
Digital badges are yours to keep, forever.BOOKS INCLUDED
Book
Beginning Apache Spark 2: With Resilient Distributed Datasets, Spark SQL, Structured Streaming and Spark Machine Learning LibraryA tutorial on the Apache Spark platform written by an expert engineer and trainer, this book will give you the fundamentals to become proficient in using Apache Spark and know when and how to apply it to your big data applications.
5h 7m
By Hien Luu
Book
Practical Apache Spark: Using the Scala APIFollowing a learn-to-do-by-yourself approach to teaching Apache Spark using Scala, this book will help you learn the concepts, practice the code snippets in Scala, and complete the assignments given to get an overall exposure.
1h 53m
By Dharanitharan Ganesan, Subhashini Chellappan
Book
Next-Generation Big Data: A Practical Guide to Apache Kudu, Impala, and SparkUtilize this practical and easy-to-follow guide to modernize traditional enterprise data warehouse and business intelligence environments with next-generation big data technologies.
4h 13m
By Butch Quinto
Book
PySpark Recipes: A Problem-Solution Approach with PySpark2Taking you on an interesting journey to learn about PySpark and big data, this book uses a problem-solution approach where every problem is followed by a detailed, step-by-step answer which will improve your thought process for solving big data problems with PySpark.
3h 2m
By Raju Kumar Mishra
Book
Big Data SMACK: A Guide to Apache Spark, Mesos, Akka, Cassandra, and KafkaExplaining each of the full-stack technologies and, more importantly, how to best integrate them, this book provides detailed coverage of the practical benefits of these technologies and incorporates real-world examples in every situation.
3h 56m
By Isaac Ruiz, Raul Estrada
Book
Pro Spark Streaming: The Zen of Real-Time Analytics Using Apache SparkIntroducing use cases in each chapter from a specific industry, and using publicly available datasets from that domain to unravel the intricacies of production-grade design and implementation, this book walks you through end-to-end real-time application development using real-world applications, data, and code.
4h 16m
By Zubair Nabi
Book
Spark: Big Data Cluster Computing in ProductionWith real-world production insight and expert guidance, tips, and tricks, this incredibly useful resource goes beyond general Spark overviews to provide targeted guidance toward using lightning-fast big data clustering in production.
3h 35m
By Brennon York, Ema Orhian, Ilya Ganelin, Kai Sasaki
Book
Big Data Analytics with Spark: A Practitioner's Guide to Using Spark for Large-Scale Data Processing, Machine Learning, and Graph Analytics, and High-Velocity Data Stream ProcessingHelping you become a much sought-after Spark expert, this step-by-step guide shows you how to use Spark for different types of big data analytics projects, including batch, interactive, graph, and stream data analysis as well as machine learning.
4h 56m
By Mohammed Guller
SHOW MORE
FREE ACCESS
BOOKS INCLUDED
Book
Beginning Apache Spark 3This book begins by explaining different ways of interacting with Apache Spark, such as Spark Concepts and Architecture, and Spark Unified Stack.
5h 19m
By Hien Luu
Book
Hands-on Guide to Apache Spark 3: Build Scalable Computing Engines for Batch and Stream Data ProcessingThis book explains how to scale Apache Spark 3 to handle massive amounts of data, either via batch or streaming processing. It covers how to use Spark's structured APIs to perform complex data transformations and analyses you can use to implement end-to-end analytics workflows.
3h 46m
By Alfonso Antolínez García
BOOKS INCLUDED
Book
Big Data SMACK: A Guide to Apache Spark, Mesos, Akka, Cassandra, and KafkaExplaining each of the full-stack technologies and, more importantly, how to best integrate them, this book provides detailed coverage of the practical benefits of these technologies and incorporates real-world examples in every situation.
3h 56m
By Isaac Ruiz, Raul Estrada
Book
Pro Spark Streaming: The Zen of Real-Time Analytics Using Apache SparkIntroducing use cases in each chapter from a specific industry, and using publicly available datasets from that domain to unravel the intricacies of production-grade design and implementation, this book walks you through end-to-end real-time application development using real-world applications, data, and code.
4h 16m
By Zubair Nabi
Book
Big Data Analytics with Spark: A Practitioner's Guide to Using Spark for Large-Scale Data Processing, Machine Learning, and Graph Analytics, and High-Velocity Data Stream ProcessingHelping you become a much sought-after Spark expert, this step-by-step guide shows you how to use Spark for different types of big data analytics projects, including batch, interactive, graph, and stream data analysis as well as machine learning.
4h 56m
By Mohammed Guller
Book
Spark: Big Data Cluster Computing in ProductionWith real-world production insight and expert guidance, tips, and tricks, this incredibly useful resource goes beyond general Spark overviews to provide targeted guidance toward using lightning-fast big data clustering in production.
3h 35m
By Brennon York, Ema Orhian, Ilya Ganelin, Kai Sasaki
Book
PySpark Recipes: A Problem-Solution Approach with PySpark2Taking you on an interesting journey to learn about PySpark and big data, this book uses a problem-solution approach where every problem is followed by a detailed, step-by-step answer which will improve your thought process for solving big data problems with PySpark.
3h 2m
By Raju Kumar Mishra
SHOW MORE
FREE ACCESS
SKILL BENCHMARKS INCLUDED
Apache Spark Competency (Intermediate Level)
The Apache Spark Competency (Intermediate Level) benchmark measures your knowledge of deploying, using and streaming with Apache Spark. You will be evaluated on Spark clusters, jobs, streaming, and transforming with Spark SQL. Learners scoring high on this benchmark demonstrate the skills necessary to using Apache Spark in thr data streaming applications.
20m
| 15 questions