Final Exam: Data Wrangler

Intermediate
  • 1 video | 32s
  • Includes Assessment
  • Earns a Badge
Rating 3.0 of 6 users Rating 3.0 of 6 users (6)
Final Exam: Data Wrangler will test your knowledge and application of the topics presented throughout the Data Wrangler track of the Skillsoft Aspire Data Analyst to Data Scientist Journey.

WHAT YOU WILL LEARN

  • Change column values by applying functions
    recognize the capabilities of microsoft machine learning tools
    implement deep learning using keras
    perform statistical operations on dataframes
    create and configure pandas series object
    load multiple sheets from an excel document
    change date formats to the iso 8601 standard
    identify and troubleshoot missing data
    recognize the machine learning tools provided by aws for data analysis
    identify and work with time-series data
    plot pie charts, box plots, and scatter plots using pandas
    create and configure pandas dataframe objects
    use a regular expression to extract data into a new column
    extract subsets of data using filtering
    handle common errors encountered when reading csv data
    work with scikit-learn to implement machine learning
    identify kinds of masking operations
    build and run the application and confirm the output using hdfs from both the command line and the web application
    describe the different primitive and complex data types available in hive
    apply grouping and aggregation operations on a dataframe to analyze categories of data in a dataset
    perform create, read, update, and delete operations on a mongodb document
    work with data in the form of key-value pairs - map data structures in hive
    use a spark accumulator as a counter
    list the various frameworks that can be used to process data from data lakes
    install mongodb and implement data partitioning using mongodb
    recognize the read and write optimizations in mongodb
    use createindex to build an index on a collection
    create and analyze categories of data in a dataset using windows
    split columns based on a pattern
    create and instantiate a directed acyclic graph in airflow
  • load a few rows of data into a table and query it with simple select statements
    use the find operation to select documents from a collection
    define the mapper for a mapreduce application to build an inverted index from a set of text files
    describe the data processing strategies provided by mapreduce v2, hive, pig, and yam for processing data with data lakes
    configure the reducer and the driver for the inverted index application
    describe the beneficial features that we can achieve using serverless and lambda architectures
    create the driver program for the mapreduce application
    configure and test pymongo in a python program
    describe data ingestion approaches and compare avro and parquet file format benefits
    test airflow tasks using the airflow command line utility
    define and run a join query involving two related tables
    use the alter table statement to change the definition of a hive table
    use the union and union all operations on table data and distinguish between the two
    apply a group by transformation to aggregate with a conditional value
    use the mongoexport tool to export data from mongodb to json and csv
    define what a window is in the context of spark dataframes and when they can be used
    setup and install apache airflow
    define a vehicle type that can be used to represent automobiles to be stored in a java priorityqueue
    implement data lakes using aws
    create a spark dataframe from the contents of a csv file and apply some simple transformations on the dataframe
    list the prominent distributed data models along with their associative implementation benefits
    use maven to create a new project for a mapreduce application and plan out the map and reduce phases by examining the auto prices dataset
    code up a combiner for the mapreduce application and configure the driver to use it for a partial reduction on the mapper nodes of the cluster
    recall the prominent data pattern implementation in microservices
    demonstrate how to ingest data using sqoop
    implement a multi-stage aggregation pipeline
    flatten multi-dimensional data structures by chaining lateral views
    trim and clean a dataframe before a view is created as a precursor to running sql queries on it
    use the mongoimport tool to import from json and csv
    compare managed and external tables in hive and how they relate to the underlying data

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.

Digital badges are yours to keep, forever.

YOU MIGHT ALSO LIKE

Rating 3.4 of 13 users Rating 3.4 of 13 users (13)
Rating 5.0 of 1 users Rating 5.0 of 1 users (1)
Rating 4.6 of 12 users Rating 4.6 of 12 users (12)

PEOPLE WHO VIEWED THIS ALSO VIEWED THESE

Rating 4.4 of 52 users Rating 4.4 of 52 users (52)
Rating 3.4 of 13 users Rating 3.4 of 13 users (13)
Rating 4.7 of 7 users Rating 4.7 of 7 users (7)