Final Exam: Data Wrangler
Intermediate
- 1 video | 32s
- Includes Assessment
- Earns a Badge
Final Exam: Data Wrangler will test your knowledge and application of the topics presented throughout the Data Wrangler track of the Skillsoft Aspire Data Analyst to Data Scientist Journey.
WHAT YOU WILL LEARN
-
Change column values by applying functionsrecognize the capabilities of microsoft machine learning toolsimplement deep learning using kerasperform statistical operations on dataframescreate and configure pandas series objectload multiple sheets from an excel documentchange date formats to the iso 8601 standardidentify and troubleshoot missing datarecognize the machine learning tools provided by aws for data analysisidentify and work with time-series dataplot pie charts, box plots, and scatter plots using pandascreate and configure pandas dataframe objectsuse a regular expression to extract data into a new columnextract subsets of data using filteringhandle common errors encountered when reading csv datawork with scikit-learn to implement machine learningidentify kinds of masking operationsbuild and run the application and confirm the output using hdfs from both the command line and the web applicationdescribe the different primitive and complex data types available in hiveapply grouping and aggregation operations on a dataframe to analyze categories of data in a datasetperform create, read, update, and delete operations on a mongodb documentwork with data in the form of key-value pairs - map data structures in hiveuse a spark accumulator as a counterlist the various frameworks that can be used to process data from data lakesinstall mongodb and implement data partitioning using mongodbrecognize the read and write optimizations in mongodbuse createindex to build an index on a collectioncreate and analyze categories of data in a dataset using windowssplit columns based on a patterncreate and instantiate a directed acyclic graph in airflow
-
load a few rows of data into a table and query it with simple select statementsuse the find operation to select documents from a collectiondefine the mapper for a mapreduce application to build an inverted index from a set of text filesdescribe the data processing strategies provided by mapreduce v2, hive, pig, and yam for processing data with data lakesconfigure the reducer and the driver for the inverted index applicationdescribe the beneficial features that we can achieve using serverless and lambda architecturescreate the driver program for the mapreduce applicationconfigure and test pymongo in a python programdescribe data ingestion approaches and compare avro and parquet file format benefitstest airflow tasks using the airflow command line utilitydefine and run a join query involving two related tablesuse the alter table statement to change the definition of a hive tableuse the union and union all operations on table data and distinguish between the twoapply a group by transformation to aggregate with a conditional valueuse the mongoexport tool to export data from mongodb to json and csvdefine what a window is in the context of spark dataframes and when they can be usedsetup and install apache airflowdefine a vehicle type that can be used to represent automobiles to be stored in a java priorityqueueimplement data lakes using awscreate a spark dataframe from the contents of a csv file and apply some simple transformations on the dataframelist the prominent distributed data models along with their associative implementation benefitsuse maven to create a new project for a mapreduce application and plan out the map and reduce phases by examining the auto prices datasetcode up a combiner for the mapreduce application and configure the driver to use it for a partial reduction on the mapper nodes of the clusterrecall the prominent data pattern implementation in microservicesdemonstrate how to ingest data using sqoopimplement a multi-stage aggregation pipelineflatten multi-dimensional data structures by chaining lateral viewstrim and clean a dataframe before a view is created as a precursor to running sql queries on ituse the mongoimport tool to import from json and csvcompare managed and external tables in hive and how they relate to the underlying data
EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE
Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.
Digital badges are yours to keep, forever.