Data Science has become the de facto field in computational and predictive statistical analysis and Python has become an indispensable tool to enable this. Explore the use of key tools and libraries used by Python for Data including NumPy and Pandas.
GETTING STARTED
Analyzing Data Using Python: Data Analytics Using Pandas
Built on the Python programming language, pandas provides a flexible and open source tool for data manipulation. In this course, you'll develop the skills you need to get started with this library. You'll begin by installing pandas from a Jupyter notebook using pip. Next, you'll instantiate a pandas object, including a Series and DataFrame, and practice several ways of instantiating Dataframes - for instance, from lists, dictionaries of lists, and tuples created from lists using the zip method. You round out this course by performing filter operations on DataFrames using the loc and iloc operations - fundamental techniques used to access specific rows and columns. You'll use loc to identify rows based on labels and iloc to access rows based on the index offset position starting from 0.
You can analyze a myriad of data formats through pandas - all you need to know is how. In this course, you'll bring various data types into pandas and perform several operations on the data. You'll practice using common file types such as CSV, Excel, JSON, and HTML through pandas. You'll not only learn how to open and read files of different types, but you'll also serialize objects and copy them to the in-memory clipboard. You'll move on to perform various fundamental operations on DataFrame objects. Lastly, you'll learn to compute basic statistics, access metadata, and modify and sort data in rows.
Not all data is useful. Luckily, there are some powerful filtering operations available in pandas. The course begins with a detailed look at how loc and iloc can be used to access specific data from a DataFrame. You'll move on to filter data using the classic pandas lookup syntax and the pandas filter and query methods. You'll illustrate how the filter function accepts wildcards as well as regular expressions and use various methods such as the .isin method to filter data. Furthermore, you'll filter data using either two pairs of square brackets - in which case the resulting subset is itself a DataFrame - or a single pair of square brackets, in which case the returned data takes the form of a Series. You'll drop rows and columns from a pandas DataFrame and see how rows can be filtered out of a DataFrame. Lastly, you'll identify a possible gotcha that arises when you drop rows in-place but neglect to reset the index labels in your object.
For data analysis to be useful and accurate, the analyzed data needs to be cleaned and curated. There are copious methods to achieve this in pandas. In this course, you'll learn how to identify and eliminate duplicates in pandas. You'll start by using the pandas cut method to discretize data into bins, using bins to plot histograms and identify outliers using box-and-whisker plots. You'll parse and work with datetime objects read in from strings and convert string columns to datetime using the dateutils python library. Moving on, you'll master different pandas methods for aggregating data - including the groupby, pivot, and pivot_table methods. Lastly, you'll perform various joins - inner, left outer, right outer, and full outer - using both the merge and join methods.
This course will get you familiar with the building blocks of Altair visualizations and some of the important chart settings. You will touch upon some of the fundamentals of plotting graphs in Altair. You'll start off by learning about the basic data structures that can form the basis of Altair visualizations, including JSON data and Pandas DataFrames in both wide-form and long-form. You'll then move on to plotting one of the simpler graphs, histograms, to visualize the distribution of values for a quantitative field in your dataset. While doing so, you'll get to explore the different ways in which Altair graphs can be customized including augmenting your chart with text, layering histograms to view two distributions together, and making histograms interactive.
This course will introduce you to a breadth of charts available in Altair and how you can use them to get an all-round understanding of your data. The focus is to get you familiar with the wide variety of graphs that are available. You'll begin by visualizing a distribution of numeric values using box plots and violin charts, each of which has its own strengths and limitations when analyzing distributions. You'll then move on to bar charts to analyze numbers associated with categories in your data. While doing so, you will get to explore a variety of aggregate operations that are available in Altair in order to calculate a sum, mean, median, and so on. You'll then use line charts to visualize the changes in a particular value over a period of time and also its related visual - the area chart. Finally, you'll produce scatter plots to visualize the relationship between a pair of fields in your data. Throughout this course, you'll delve into a number of customizations which are available in Altair for each of the graphs which you plot.
This course introduces you to the use of Altair visualizations which can convey very detailed information for specialized datasets. You will cover some of the graphs that can be used to convey the information in very specific kinds of datasets, while also giving you some hands-on experience with advanced chart configurations. You'll begin by plotting information on a map, both to mark locations of places as well as to convey numerical information about regions. You'll then build a heatmap to analyze the numbers associated with a combination of two categorical variables. Next, you'll implement candlestick charts to visualize stock price movements, dot plots to analyze the range of movement for some values, and Gantt charts to view a project plan. Finally, you'll explore the use of window functions to analyze the top K elements in each category of your dataset.
Pandas, a popular Python library, is part of the open-source PyData stack. In this 10-video Skillsoft Aspire course, you will learn that Pandas represents data in a tabular format which makes it easy and intuitive to perform data manipulation, cleaning, and exploration. You will use Python's DataFrame a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). To take this course, you should already be familiar with Python programming language; all code writing is in Jupyter notebooks. You will work with basic Pandas data structures, Pandas Series objects representing a single column of data which can store numerical values, strings, Booleans, and more complex data types. Learn how to use Pandas DataFrame, which represents data in table form. Finally, learn to append and sort series values, add missing data, add columns, and aggregate data in a DataFrame. The closing exercise involves instantiating a Pandas Series object by using both a list and a dictionary; changing the Series index to something other than default value; and practicing sorting Series values in place.
Probability distributions are statistical models that show the possible outcomes and statistical likelihood of any given event and are often useful for making business decisions. Get familiar with the theoretical concepts around statistics and probability distributions through this course and delve into applying statistical concepts to analyze your data using Python. Start by exploring statistical concepts and terminology that will help you understand the data you want to use for estimations on a population. You'll then examine probability distributions - the different forms of distributions, the types of events they model, and the various functions available to analyze distributions. Finally, you'll learn how to use Python to calculate and visualize confidence intervals, as well as the skewness and kurtosis of a distribution. After completing this course, you'll have a foundational understanding of statistical analysis and probability distributions.
Graphs are used to model a large number of real-world scenarios, including professional networks, flight networks, and schedules. Working in these problem domains involves a deep understanding of how graphs are represented and how graph algorithms work. Learn the basic components of a graph and how nodes and edges can be used to model relationships. Examine how domains such as social networks, purchases on an e-commerce platform, and connected devices can be modeled using graphs. Next, explore how to use an organizing principle to add semantic meaning and context to graphs. Discover how to apply higher-level organizing principles to knowledge graphs using taxonomies and ontologies. Finally, get hands-on experience creating and manipulating graphs, and running graph algorithms using the NetworkX library in Python. When you have completed this course, you will have a solid understanding of how graphs model entities and relationships in the real world.
Extract, Transform, and Load (ETL) tasks help in collecting and manipulating data from diverse sources to fit the user's requirements. In this course, you'll explore different interfaces available in the petl library and perform basic ETL tasks using petl. You will begin by examining how to import data from various data sources, including delimited text files, Microsoft Excel, and structured JSON data. You'll also recognize how to load and save data in these formats. Next, you'll outline how to integrate petl with a relational database using SQLAlchemy and SQLite3. Finally, you'll perform transform operations on data using different petl features to filter specific data needed by you. Once you have completed this course, you'll have a clear understanding of the role played by petl in simplifying ETL tasks.
Software development often requires manipulation of data that has been extracted from different data sources to make it compatible with the user's specifications and requirements. petl's data transformation features can help achieve this. In this course, you'll investigate fundamental data transformations that can be performed using the petl library. You'll demonstrate how to load data into a petl table, filter columns, and combine multiple tables using different forms of concatenation operations. Next, you'll outline how to convert data in a petl table into a form that is compatible with your requirements. This includes transforming strings to numbers, applying calculations to numeric fields, and replacing specific values in the table. Lastly, you'll explore ways to filter content in petl tables using the facet() function and different select operations.
Petl facilitates and streamlines tasks related to data extraction and manipulation, often required by software developers to make data fit for actionable business intelligence (BI). In this course, you'll work with complex operations in petl and outline how to extract data from a source and convert it to a format that complies with your requirements. You'll begin by investigating the use of regular expressions to analyze, search, and extract specific rows and columns in a petl table. You'll then create transform functions and apply them to your data. These include operations on numeric as well as string fields. Moving on, you'll implement sort operations to organize data in a petl table and arrange it in a sequence that suits your purposes. Finally, you'll investigate how to perform joins and set operations on data tables and meaningfully reduce the data in them using aggregation functions.
Learn the fundamentals of streaming data with Apache Spark. During this course, you will discover the differences between batch and streaming data. Observe the types of streaming data sources. Learn about how to process streaming data, transform the stream, and materialize the results. Decouple a streaming application from the data sources with a message transport. Next, learn about techniques used in Spark 1.x to work with streaming data and how it contrasts with processing batch data; how structured streaming in Spark 2.x is able to ease the task of stream processing for the app developer; and how streaming processing works in both Spark 1.x and 2.x. Finally, learn how triggers can be set up to periodically process streaming data; and the key aspects of working with structured streaming in Spark
Process streaming data with Spark, the analytic engine built on Hadoop. In this course, you will discover how to develop applications in Spark to work with streaming data and generate output. Topics include the following: Configure a streaming data source; Use Netcat and write applications to process the data stream; Learn the effects of using the Update mode on your stream processing application's output; Write a monitoring application that listens for new files added to a directory; Compare the append output with the update mode; Develop applications to limit files processed in each trigger; Use Spark's Complete mode for output; Perform aggregation operations on streaming data with the DataFrame API; Process streaming data with Spark SQL queries.
The wealth of Python data visualization libraries makes it hard to decide the best choice for each use case. However, if you're looking for statistical plots that are easy to build and visually appealing, Seaborn is the obvious choice. You'll begin this course by using Seaborn to construct simple univariate histograms and use kernel density estimation, or KDE, to visualize the probability distribution of your data. You'll then work with bivariate histograms and KDE curves. Next, you'll use box plots to concisely represent the median and the inter-quartile range (IQR) and define outliers in data. You'll work with boxen plots, which are conceptually similar to box plots but employ percentile markers rather than whiskers. Finally, you'll use Violin plots to represent the entire probability density function, obtained via a KDE estimation, for your data.
Seaborn's smartly designed interface lets you illuminate data through aesthetically pleasing statistical graphics that are incredibly easy to build. In this course, you'll discover Seaborn's capabilities. You'll begin using strip plots and swarm plots and recognizing how they work together using low-intensity noise. You'll then work with time series data through various techniques, like resampling data at different time frequencies and plotting with confidence intervals and other types of error bars. Next, you'll visualize both logistic and linear regression curves. Moving on, you'll use the pairplot function to visualize the relationships between columns in your data, taken two at a time, in a grid format. You'll change the chart type being visualized and create pair plots with multiple chart types in each plot. Lastly, you'll create and format a heatmap of a correlation matrix to identify relationships between dataset columns.
Matplotlib is a Python plotting library used to create dynamic visualizations using pyplot, a state-based interface. You'll learn how to correctly install and use Matplotlib to build line charts, bar charts, and histograms in this course. You'll create basic line charts out of randomly generated data. You'll learn how to use the plt.subplots() function, import data from a CSV file using pandas, and create and customize various line charts. Additionally, you'll create figures holding more than one axes object, learn why and how to use the twinx() function, and create multiple lines in the same line chart with different y-axes for each line. Moving on, you'll construct histograms that visualize multiple variables and approximate the cumulative probability density function. Lastly, you'll create some bar charts to represent categorical data.
Matplotlib can be used to create box-and-whisker plots to display statistics. These dense visualizations pack much information into a compact form, including the median, 25th and 75th percentiles, interquartile range, and outliers. In this course, you'll learn how to work with all aspects of box-and-whisker plots, such as the use of confidence-interval notches, mean markers, and fill color. You'll also build grouped box-and-whisker plots. Next, you'll create scatter plots and heatmaps, powerful tools in exploratory data analysis. You'll build standard scatter plots before customizing various aspects of their appearance. You'll then examine the ideal uses of scatter plots and correlation heatmaps. You'll move on to visualizing composition, first using pie charts, building charts that explode out specific slices. Lastly, you'll build treemaps to visualize data with multiple levels of hierarchy.
ThisSkillsoft Aspire course explores NumPy, a Python library used in data science and big data. NumPy provides a framework to express data in the form of arrays, and is the fundamental building block for several other Python libraries. For this course, you will need to know basics of programming in Python3, and should also have some familiarity in working with Jupyter notebooks. You will learn how to create NumPy arrays and perform basic mathematical operations on them. Next you will see how to modify, index, slice, and reshape the arrays; and examine the NumPy library's universal array functions that operate on an element-by-element basis. Conclude by learning how to iterate various options through NumPy arrays.
NumPy is oneof the fundamental packages for scientific computing that allows data to be represented in dimensional arrays. This course covers the array operations you can undertake such as image manipulation, fancy indexing, and broadcasting. To take this Skillsoft Aspire course, you should be comfortable with how to create, index, and slice Numpy arrays, and apply aggregate and universal functions. Among the topics, you will learn about the several options available in NumPy to split arrays. You will learn how to use NumPy to work with digital images, which are multidimensional arrays. Next, you will observe how to manipulate a color image, perform slicing operations to view sections of the image, and use a SciPy package for image manipulation. You will learn how to use masks, an array of index values, to access multiple elements of an array simultaneously, referred to as Sansi indexing. Finally, this course covers broadcasting to perform operations between mismatched arrays.
Simplify data analysis with Pandas DataFrames. Pandas is a Python library that enables you to work with series and tabular data, including initialization, and population. For this course, learners do not need prior experience working with Pandas, but should be familiar with Python3, and Jupyter Notebooks. Topics include the following: Define your own index for a Pandas series object; load data from a CSV (comma separated values) file, to create a Pandas DataFrame; Add and remove data from your Pandas DataFrame; Analyze a portion of your DataFrame; Examine how to reshape or reorient data, and to create a pivot table. Finally, represent multidimensional data in two-dimensional DataFrames, with multi or hierarchical indexes.
Explore advanced data manipulation and analysis with Pandas DataFrames, a Python library that shares similarities with relational databases. To take this course, prior basic experience is needed with Pandas DataFrames, data loading, and Jupyter Notebook data manipulation. You will learn to iterate data in your DataFrame. See how to export data to Excel files, JSON (JavaScript Object Notation) files, and CSV (comma separated values) files. Sort the contents of a DataFrame and manage missing data. Group data with a multi-index. Merge disparate data into a single DataFrame through join and concatenate operations. Finally, you will determine when and where to integrate data with structured queries, similar to SQL.
Explore Seaborn, a Python library used in data science that provides an interface for drawing graphs that conveys a lot of information, and are also visually appealing. To take this course, learners should be comfortable programming in Python and using Jupyter notebooks; familiarity with Pandas for Numpy would be helpful, but is not required. The course explores how Seaborn provides higher-level abstractions over Python's Matplotlib, how it is tightly integrated with the PyData stack, and how it integrates with other data structure libraries such as NumPy and Pandas. You will learn to visualize the distribution of a single column of data in a Pandas DataFrame by using histograms and the kernel density estimation curve, and then slowly begin to customize the aesthetics of the plot. Next, learn to visualize bivariate distributions, which are data with two variables in the same plot, and see the various ways to do it in Seaborn. Finally, you will explore different ways to generate regression plots in Seaborn.
Explore Seaborn, a Python library used in data science that provides an interface for drawing graphs that convey a lot of information, and are also visually appealing. To take this course, learners should be comfortable programming in Python, have some experience using Seaborn for basic plots and visualizations, and should be familiar with plotting distributions, as well as simple regression plots. You will work with continuous variables to modify plots, and to put it into a context that can be shared. Next, learn how to plot categorical variables by using box plots, violin plots, swarm plots, and FacetGrids (lattice or trellis plotting). You will learn to plot a grid of graphs for each category of your data. Learners will explore Seaborn standard aesthetic configurations, including the color palette, and style elements. Finally, this course teaches learners how to tweak displayed data to convey more information from the graphs.
This 12-video Skillsoft Aspire course uses Python, the preferred programming language for data science, to explore data in Pandas with popular chart types such as the bar graph, histogram, pie chart, and box plot. Discover how to work with time series and string data in data sets. Pandas represents data in a tabular format which makes it easy to perform data manipulation, cleaning, and data exploration, all important parts of any data engineer's toolkit. You will learn how to use Matplotlib, a multiplatform data visualization library built on NumPy, the Python library that is used to work with multidimensional data. Learners will use Panda's features to work with specific kinds of data such as time series data and stream data. This course uses a real-world demonstration using Pandas to analyze stock market returns for Amazon. Finally, you will learn how to make data transformations to clean, format, and transform the data into a useful form for further analysis.
This course uses Python, the preferred programming language for data science, to explore Pandas, a popular Python library, and is a part of the open-source PyData stack. In this 11-video Skillsoft Aspire course, learners will use Pandas DataFrame to perform advanced category grouping, aggregations, and filtering operations. You will see how to use Pandas to retrieve a subset of your data by performing filtering operations both on rows, as well as columns. You will perform analysis on multilevel data by using the GROUPBY operation on Dataframe. You will then learn to use data masking or data obfuscation to protect classified or commercially sensitive data. Learners will work with duplicate data, an important part of data cleaning. You will examine the two broad categories of data continuous data which comprise of a continuous range of value, and categorical data has discrete, finite values. Pandas automatically generates indexes for each of our DataFrame rows, and here you will learn to different types of reindexing operations on Dataframe.
In this course, you will explore machine learning predictive modeling and commonly used models like regressions, clustering, and Decision Trees that are applied in Python with the scikit-learn package. Begin this 13-video course with an overview of predictive modeling and recognize its characteristics. You will then use Python and related data analysis libraries including NumPy, Pandas, Matplotlib, and Seaborn, to perform exploratory data analysis. Next, you will examine regression methods, recognizing the key features of Linear and Logistic regressions, then apply both a linear and a logistic regression with Python. Learn about clustering methods, including the key features of hierarchical clustering and K-Means clustering, then learn how to apply hierarchical clustering and K-Means clustering with Python. Examine the key features of Decision Trees and Random Forests, then apply a Decision Tree and a Random Forest with Python. In the concluding exercise, learners will be asked to apply linear regression, logistic regression, hierarchical clustering, Decision Trees, and Random Forests with Python.
Python libraries, such as NumPy and SciPy, are used for mathematical and numerical analysis. Through this course, learn how to generate uniform, binomial, and Poisson distributions using these libraries. Begin by exploring uniform distributions and delve into continuous and discrete distributions. You will then explore binomial distributions in-depth, including real-life situations where they can be applied. This course will also help you learn more about Poisson distributions and recognize their use cases. While examining these distributions, you will use functions, such as the probability density or probability mass functions and cumulative distributions functions, among others, to make estimations from your data. Upon completion of this course, you'll have the skills and knowledge to implement and visualize uniform, binomial, and Poisson distributions in Python.
This course dives deep into normal distributions, also known as Gaussian distributions, while also introducing you to the law of large numbers and the Central Limit Theorem. You will begin by using Python's SciPy library to generate a normal distribution and examine the use of several available functions that allow you to make estimations on normally distributed data. This course will also help you understand and visualize the law of large numbers and explore the Central Limit theorem by generating multiple samples and analyzing them. After you are done with this course, you'll have the skills and knowledge to analyze data and build your own models.
In order to really understand how graphs work, it is important to know how they are implemented. There are multiple ways to represent graphs in code and each representation has its own advantages and disadvantages. In this course, you will implement graphs using three different representations - the adjacency matrix, the adjacency list, and the adjacency set. Learn how the adjacency matrix representation uses a square matrix to represent connections between the nodes of a graph and also edge weights. Next, explore how the adjacency list suffers from a major drawback: the same graph can have multiple representations. Finally, discover how the adjacency set representation has exactly one way in which a graph is represented. When you are finished with this course, you will be able to create and work with your own graph structures and optimize them for different purposes.
What makes the graph data structure very interesting and powerful is the large number of algorithms that can be run on graphs to extract insights. Common graph algorithms include traversing a graph and computing the shortest path between nodes. Implementing these algorithms is a great way to learn how graphs are explored and optimized. In this course, learn how graphs can be traversed by studying both depth-first and breadth-first graph traversal and discover how they can be implemented using a stack and a queue respectively. Next, explore how to compute the shortest path in an unweighted graph. And finally, use Dijkstra's algorithm to compute the shortest path in a weighted graph. Upon completion of this course, you will be able to implement optimal algorithms on graphs.
EARN A DIGITAL BADGE WHEN YOU COMPLETE THESE COURSES
Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.
Understand and implement big data analysis solutions in pandas with an emphasis on performance. This book strengthens your intuition for working with pandas, the Python data analysis library, by exploring its underlying implementation and data structures.
Whether you are dealing with sales data, investment data, medical data, web page usage, or other data sets, this book explores the latest Python tools and techniques to help you tackle the world of data acquisition and analysis.
By expertly showing the strength of the Python programming language when applied to processing, managing and retrieving information, this book will help you tackle the world of data acquisition and analysis using the power of the Python language.
The book will start with quick introductions to Python and its ecosystem libraries for data science such as JupyterLab, Numpy, Pandas, SciPy, Matplotlib, and Seaborn.
If you're looking to expedite a data science or sophisticated data analysis project, you've come to the perfect place. Each data analysis topic is covered step-by-step with real-world examples.
Helping you build the foundational data science skills necessary to work with and better understand complex data science algorithms, this book provides complete Python coding examples to complement and clarify data science concepts, and enrich the learning experience.
Written for the general reader with no previous analytics or programming experience, this step-by-step book will show you how to produce hands-on solutions to real-world business problems, using state-of-the-art techniques.
Written for people who are new to data analysis, this book provides the statistical background needed to get started in data science programming, including probability, random distributions, hypothesis testing, confidence intervals, and building regression models for prediction.
Short on theory and long on actionable analytics, this definitive guide provides readers with a detailed comparative introduction and overview of both languages and features concise tutorials with command-by-command translations of R to Python and Python to R.
Providing insight into essential data science skills in a holistic manner, this book will empower you to analyze data, formulate proper questions, and produce actionable insights, three core stages in most data science endeavors.
Using a problem-solution approach, this authoritative resource will teach you how to carry out data analysis with PySpark SQL, graphframes, and graph data processing.
Written for people who are new to data analysis, this book provides the statistical background needed to get started in data science programming, including probability, random distributions, hypothesis testing, confidence intervals, and building regression models for prediction.
Leverage the numerical and mathematical modules in Python and its standard library as well as popular open source numerical Python packages like NumPy, SciPy, FiPy, matplotlib and more.
Whether you are dealing with sales data, investment data, medical data, web page usage, or other data sets, this book explores the latest Python tools and techniques to help you tackle the world of data acquisition and analysis.
By expertly showing the strength of the Python programming language when applied to processing, managing and retrieving information, this book will help you tackle the world of data acquisition and analysis using the power of the Python language.
Whether you are dealing with sales data, investment data, medical data, web page usage, or other data sets, this book explores the latest Python tools and techniques to help you tackle the world of data acquisition and analysis.
Leverage the numerical and mathematical modules in Python and its standard library as well as popular open source numerical Python packages like NumPy, SciPy, FiPy, matplotlib and more.
Helping you build the foundational data science skills necessary to work with and better understand complex data science algorithms, this book provides complete Python coding examples to complement and clarify data science concepts, and enrich the learning experience.
Featuring a detailed business case on effective strategies on data visualization, this book looks at Python from a data science point of view and teaches proven techniques for data visualization as used in making critical business decisions.
Written for the general reader with no previous analytics or programming experience, this step-by-step book will show you how to produce hands-on solutions to real-world business problems, using state-of-the-art techniques.
Written for people who are new to data analysis, this book provides the statistical background needed to get started in data science programming, including probability, random distributions, hypothesis testing, confidence intervals, and building regression models for prediction.
Short on theory and long on actionable analytics, this definitive guide provides readers with a detailed comparative introduction and overview of both languages and features concise tutorials with command-by-command translations of R to Python and Python to R.
Providing insight into essential data science skills in a holistic manner, this book will empower you to analyze data, formulate proper questions, and produce actionable insights, three core stages in most data science endeavors.
The Python Data Visualization benchmark will measure your ability to apply data visualization techniques in Python using Python statistical plots, Python with Altair, and Dash Python frameworks. You will be evaluated on your ability to recognize the visual and analytical features of Python. A learner who scores high on this benchmark demonstrates that they have the skills to develop interactive Python applications with visual representations of plots, graphs, and charts.
The Graph Analytics Literacy benchmark will measure your ability to recall, recognize, and understand graph concepts, fundamentals of graph databases, and graph data structures. You will be evaluated on your ability to recognize the basic concepts of graph data structures and algorithms. A learner who scores high on this benchmark demonstrates that they have the required foundation of graph data structures.
The Python ETL with petl Literacy (Beginner Level) benchmark measures your ability to process data belonging to various file formats, connect to a database, and perform basic Extract, Transform, and Load (ETL) tasks using petl. You will be evaluated on your ability to perform fundamental transform operations on numbers, strings, and tables using petl. A learner who scores high on this benchmark demonstrates that they have the skills to perform basic data transformations with petl.
The Python ETL with petl Competency (Intermediate Level) benchmark measures your ability to perform data operations by implementing replace and type change operations, querying data in petl data tables, and defining filters. You will be evaluated on your ability to extract data using regular expressions, implement joins and set operations on tables, and aggregate data using petl. A learner who scores high on this benchmark demonstrates that they have the skills to implement advanced extractions and transformations with petl.
The Data Visualization in Python with seaborn and Altair Literacy (Beginner Level) benchmark measures your ability to use Seaborn to build univariate and bivariate histograms and kernel density estimation (KDE) curves and plots, as well as box, boxen, and violin plots. You will be evaluated on your ability to recognize the types of data that can be visualized in Altair and plot some of the basic charts available in this tool. A learner who scores high on this benchmark demonstrates that they have the skills to visualize and analyze data using seaborn and Altair.
The Data Visualization in Python with seaborn and Altair Competency (Intermediate Level) benchmark measures your ability to use seaborn to work with strip and swarm plots, time series data, error bars, logistic and linear regression curves, pair plots, and heatmaps. You will be evaluated on your ability to plot different forms of charts using Altair in order to analyze a variety of datasets and visualize specialized data using a variety of Altair charts. A learner who scores high on this benchmark demonstrates that they have the skills to visualize data with representations of plots, graphs, and charts using seaborn and Altair frameworks
The Data Visualization Proficiency benchmark will measure your ability to recall, relate, demonstrate, and apply the data visualization concepts and techniques in Excel, QlikView, and various Python visualization libraries. You will be evaluated on your ability to recognize and apply the concepts of data visualization techniques, tools, and functions in Excel, Qlikview, Infographics, and Python. A learner who scores high on this benchmark demonstrates that they have the required data visualization skills to understand, apply, and work independently on the visualizations in their projects.
The Data Visualization in Python with Matplotlib Literacy (Beginner Level) benchmark will measure your ability to recall and relate underlying data visualization concepts using Python and Matplotlib. You will be evaluated on your ability to recognize the foundational concepts of data visualization, its uses, and best practices. A learner who scores high on this benchmark demonstrates that they have the basic data visualization skills to understand and grasp visualization techniques and their uses.
The Data Visualization in Python with Matplotlib Competency (Intermediate Level) benchmark will measure your ability to recall, relate, demonstrate, and apply data visualization concepts and techniques in Python using the Matplotlib library. You will be evaluated on your ability to recognize and apply data visualization concepts, techniques, tools, and functions in Matplotlib. A learner who scores high on this benchmark demonstrates that they have the required data visualization skills to understand, apply, and work independently on visualizations in their projects.
The Data Analysis with Python Literacy benchmark will measure your ability to recall and relate Python concepts, including using the NumPy library and its arrays for manipulating and analyzing data, and a basic idea of Python libraries such as pandas, Matplotlib, seaborn for data analysis. A learner who scores high on this benchmark demonstrates that they have a basic understanding of Python libraries, visualization libraries such as Matplotlib and seaborn, and basic skills for performing data analysis using NumPy and pandas.
The Data Visualization with Python Literacy benchmark will measure your ability to recall and relate the underlying data visualization concepts in Python. You will be evaluated on your ability to recognize the foundational concepts of data visualization, representation, charting, and plotting in Python using libraries such as Matplotlib, Plotly, and Seaborn. A learner who scores high on this benchmark demonstrates that they have basic data visualization skills using Python.
The Data Analysis with Python Competency benchmark will measure your ability to recall and relate Python concepts, including NumPy and pandas for manipulating, analyzing, and transforming the data, as well as Matplotlib and seaborn for visualizing data. A learner who scores high on this benchmark demonstrates that they have good Python data analysis, visualization, and data wrangling skills and can work on data analysis projects with minimal supervision.
The Data Visualization with Python competency benchmark will measure your ability to recall and relate underlying data visualization concepts in Python. You will be evaluated on your ability to recognize the concepts of data visualization and advanced data visualization, as well as data representation, charting, and plotting in Python using pandas, Matplotlib, and Plotly libraries. A learner who scores high on this benchmark demonstrates that they have data visualization skills using Python.
The Data Visualization with Python Proficiency benchmark will measure your ability to perform data visualizations in Python using advanced plotting and charting techniques, as well as various visualization libraries such as Matplotlib, Plotly, seaborn, and Bokeh. A learner who scores high on this benchmark demonstrates that they can independently work on data visualization in Python.
The Data Analysis with Python Competency benchmark will measure your ability to recall and relate Python concepts, including using NumPy and pandas for manipulating, analyzing, and transforming the data, as well as Matplotlib and seaborn for visualizing data. A learner who scores high on this benchmark demonstrates that they have very good Python data analysis, visualization, and data wrangling skills and can work independently on data analysis projects.
The Data Visualization in Python with Seaborn Literacy (Beginner Level) benchmark will measure your ability to recall and relate underlying data visualization concepts using Python and seaborn. You will be evaluated on your ability to recognize the foundational concepts of data visualization, its uses, and best practices. A learner who scores high on this benchmark demonstrates that they have the basic data visualization skills to understand and grasp visualization techniques and their uses.
The Data Visualization in Python with Seaborn Competency (Intermediate Level) benchmark will measure your ability to recall, relate, demonstrate, and apply data visualization concepts and techniques in Python using the seaborn library. You will be evaluated on your ability to recognize and apply data visualization concepts, techniques, tools, and functions in Seaborn. A learner who scores high on this benchmark demonstrates that they have the required data visualization skills to understand, apply, and work independently on visualizations in their projects.
The Graph Analytics Literacy benchmark will measure your ability to recall, recognize, and understand graph concepts, fundamentals of graph databases, and graph data structures. You will be evaluated on your ability to recognize the basic concepts of graph data structures and algorithms. A learner who scores high on this benchmark demonstrates that they have the required foundation of graph data structures.
The Graph Analytics Proficiency benchmark will measure your ability to recall, recognize, and understand graph analytics concepts, graph databases, and Cypher Query Language for querying graph data and graph data science for identifying hidden relationships. A learner who scores high on this benchmark demonstrates that they have the required skills of Neo4j graph analytics and graph data science, graph data science with Spark, and to work independently in their projects.