Apache Spark Getting Started
Apache Spark
| Beginner
- 15 videos | 1h 6m 15s
- Includes Assessment
- Earns a Badge
Explore the basics of Apache Spark, an analytics engine used for big data processing. It's an open source, cluster computing framework built on top of Hadoop. Discover how it allows operations on data with both its own library methods and with SQL, while delivering great performance. Learn the characteristics, components, and functions of Spark, Hadoop, RDDS, the spark session, and master and worker notes. Install PySpark. Then, initialize a Spark Context and Spark DataFrame from the contents of an RDD and a DataFrame. Configure a DataFrame with a map function. Retrieve and transform data. Finally, convert Spark and Pandas DataFrames and vice versa.
WHAT YOU WILL LEARN
-
Recognize where spark fits in with hadoop and its componentsDescribe spark rdds and their characteristics, including what makes them resilient and distributedIdentify the types of operations which are permitted on an rdd and describe how rdd transformations are lazily evaluatedDistinguish between rdds and dataframes and describe the relationship between the twoList the crucial components of spark and the relationships between them and recognize the functions of the spark session, master and worker nodesInstall pyspark and initialize a spark contextCreate and load data into an rdd
-
Initialize a spark dataframe from the contents of an rddWork with spark dataframes containing both primitive and structured data typesDefine the contents of a dataframe using the sqlcontextApply the map() function on an rdd to configure a dataframe with column headersRetrieve required data from within a dataframe and define and apply transformations on a dataframeConvert spark dataframes to pandas dataframes and vice versaDescribe basic spark concepts
IN THIS COURSE
-
2m 20s
-
5m 17sAfter completing this video, you will be able to recognize where Spark fits in with Hadoop and its components. FREE ACCESS
-
2m 15sUpon completion of this video, you will be able to describe Spark RDDs and their characteristics, including what makes them resilient and distributed. FREE ACCESS
-
7m 22sIn this video, you will identify the types of operations which are permitted on an RDD and describe how RDD transformations are evaluated lazily. FREE ACCESS
-
2m 32sIn this video, you will learn how to distinguish between RDDs and DataFrames, and describe the relationship between the two. FREE ACCESS
-
6m 24sUpon completion of this video, you will be able to list the crucial components of Spark and the relationships between them. You will also be able to recognize the functions of the Spark Session, Master and Worker nodes. FREE ACCESS
-
3m 43sDuring this video, you will learn how to install PySpark and initialize a Spark Session. FREE ACCESS
-
3m 16sDuring this video, you will learn how to create and load data into an RDD. FREE ACCESS
-
5m 47sIn this video, you will learn how to initialize a Spark DataFrame from the contents of an RDD. FREE ACCESS
-
4m 22sIn this video, find out how to work with Spark DataFrames containing both primitive and complex data types. FREE ACCESS
-
5m 7sIn this video, find out how to define the contents of a DataFrame using the SQLContext. FREE ACCESS
-
3m 52sDuring this video, you will learn how to apply the map() function to an RDD to configure a DataFrame with column headers. FREE ACCESS
-
7m 51sIn this video, you will retrieve required data from within a DataFrame and define and apply transformations to a DataFrame. FREE ACCESS
-
1m 57sIn this video, you will convert Spark DataFrames to Pandas DataFrames and Pandas DataFrames to Spark DataFrames. FREE ACCESS
-
4m 12sAfter completing this video, you will be able to describe basic Spark concepts. FREE ACCESS
EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE
Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.
Digital badges are yours to keep, forever.