Data Analysis Using the Spark DataFrame API
Apache Spark
| Beginner
- 16 videos | 1h 10m 46s
- Includes Assessment
- Earns a Badge
An open-source cluster-computing framework used for data science, Apache Spark has become the de facto big data framework. In this Skillsoft Aspire course, learners explore how to analyze real data sets by using DataFrame API methods. Discover how to optimize operations with shared variables and combine data from multiple DataFrames using joins. Explore the Spark 2.x version features that make it significantly faster than Spark 1.x. Other topics include how to create a Spark DataFrame from a CSV file; apply DataFrame transformations, grouping, and aggregation; perform operations on a DataFrame to analyze categories of data in a data set. Visualize the contents of a Spark DataFrame, with Matplotlib. Conclude by studying how to broadcast variables and DataFrame contents in text file format.
WHAT YOU WILL LEARN
-
Recognize the features that make spark 2.x versions significantly faster than spark 1.xSpecify the reasons for using shared variables in your spark application and distinguish between the two options available for sharing variablesCreate a spark dataframe from the contents of a csv file and apply some simple transformations on the dataframeDefine a transformation to view a random sample of data from a large dataframeApply grouping and aggregation operations on a dataframe to analyze categories of data in a datasetUse matplotlib to visualize the contents of a spark dataframePerform operations to prepare your dataset for analysis by trimming unnecessary columns and rows containing missing dataDefine and apply a generic transformation on a dataframe
-
Apply complex transformations on a dataframe to extract meaningful information from a datasetWork with broadcast variables and perform a join operation with a dataframe that has been broadcastUse a spark accumulator as a counterStore the contents of a dataframe in a text file for archiving or sharingDefine and work with a custom accumulator to count a vector of valuesPerform different join operations on spark dataframes to combine data from multiple sourcesAnalyze data using the dataframe api
IN THIS COURSE
-
2m 25s
-
6m 14sAfter completing this video, you will be able to recognize the features that make Spark 2.x versions significantly faster than Spark 1.x versions. FREE ACCESS
-
4m 54sUpon completion of this video, you will be able to specify the reasons for using shared variables in your Spark application and distinguish between the two options available for sharing variables. FREE ACCESS
-
6m 11sIn this video, you will learn how to create a Spark DataFrame from the contents of a CSV file and apply some simple transformations on the DataFrame. FREE ACCESS
-
4m 9sIn this video, you will learn how to define a transformation to view a random sample of data from a large DataFrame. FREE ACCESS
-
6m 23sTo analyze categories of data in a dataset, find out how to apply grouping and aggregation operations on a DataFrame. FREE ACCESS
-
7m 34sIn this video, you will learn how to use Matplotlib to visualize the contents of a Spark DataFrame. FREE ACCESS
-
4m 32sLearn how to perform operations to prepare your dataset for analysis by trimming unnecessary columns and rows that contain missing data. FREE ACCESS
-
4m 36sLearn how to define and apply a generic transformation to a DataFrame. FREE ACCESS
-
3m 31sIn this video, you will learn how to apply complex transformations on a DataFrame to extract meaningful information from a dataset. FREE ACCESS
-
3m 39sIn this video, you will learn how to work with broadcast variables and perform a join operation with a DataFrame that has been broadcast. FREE ACCESS
-
3m 59sDuring this video, you will learn how to use a Spark accumulator as a counter. FREE ACCESS
-
2m 15sDuring this video, you will learn how to store the contents of a DataFrame in a text file for archival purposes or sharing. FREE ACCESS
-
2m 56sIn this video, you will learn how to define and work with a custom accumulator to count a vector of values. FREE ACCESS
-
3m 28sIn this video, you will learn how to perform different join operations on Spark DataFrames to combine data from multiple sources. FREE ACCESS
-
4m 1sIn this video, you will analyze data using the DataFrame API. FREE ACCESS
EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE
Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.
Digital badges are yours to keep, forever.