Statistical Analysis and Modeling in R: Performing Classification
R Programming
| Expert
- 13 videos | 1h 36m 32s
- Includes Assessment
- Earns a Badge
Classification models are used to classify or categorize data points into two or more categories. Learn how these models work and how you can evaluate your classification models using the confusion matrix and metrics such as accuracy, precision, and recall. During this course, you'll perform classification using both logistic regression and an imbalanced dataset. You'll also examine why precision or recall scores may be better metrics than accuracy to evaluate such models. Furthermore, build a classification model using decision trees, visualize the tree structure, and explore the variable importance assigned by this tree structure to understand and interpret the model. When you've finished this course, you'll be able to confidently use logistic regression and decision trees to build classification models and evaluate your models using accuracy, precision, and recall.
WHAT YOU WILL LEARN
-
Discover the key concepts covered in this courseRecall the key metrics to evaluate classifiersFit and interpret the s-curve of logistic regressionTrain and evaluate a logistic regression modelTrain and evaluate a logistic model using all predictorsTrain a model on an imbalanced datasetInterpret the significance of coefficients, confidence intervals, and odds ratios
-
Evaluate a model built using an imbalanced datasetUse resampling techniques to improve the modelRecall the basic structure of decision tree modelsExplore and pre-process data before model fittingUse decision tree models for predictionSummarize the key concepts covered in this course
IN THIS COURSE
-
2m 9sIn this video, you’ll learn more about your instructor and this course. In this course, you’ll learn how classification models work and how to evaluate your classification model using the confusion matrix and metrics such as accuracy, precision, and recall. Then, you’ll perform classification using logistic regression and compute probabilities for outcomes. You’ll also perform classification using an imbalanced data set. Finally, you’ll build a classification model using decision trees. FREE ACCESS
-
8m 21sIn this video, you’ll learn more about classification models. You’ll learn classification models are used to categorize or classify data points into output categories that are discrete in nature. The output of a classification model is a categorical variable. Output is a discrete or a categorical value. The output cannot be any value in a range. It can only be a subset of allowed values. FREE ACCESS
-
8m 12sIn this video, you’ll watch a demo. You’ll learn how to train and use a logistic regression model for classification. Logistic regression is a classification model that classifies data into categories. You’ll see logistic regression fits an S-curve on your data. This S-curve can be used to predict binary outcomes. Logistic regression can be extended to perform multiclass classification. FREE ACCESS
-
10m 12sIn this video, you’ll watch a demo. You’ll learn to train your logistic regression model to perform classification. First, you’ll split your data into training data and test data. You’ll use training data to train our model and test data will be used to evaluate your model. To ensure your split is replicable, you’ll set your random seed to 3. Next, you’ll invoke the sample.split function, which will split your data. FREE ACCESS
-
6m 23sIn this video, you’ll watch a demo. You’ll learn how to build another logistic regression model using the predictors you have available. You'll invoke the glm function that’s your generalized linear model. The glm function can be used to build different families of models. Onscreen, this is a logistic regression model because you’ve specified family = "binomial". This will fit the logic function on the S-curve on your data and output probability scores. FREE ACCESS
-
8m 31sIn this video, you’ll watch a demo. You’ll learn more about imbalanced data. Imbalanced data is one kind of skewed data. Skewness is a measure of asymmetry of the probability distribution of a random variable. If you look at a probability distribution and it's symmetric about the center, that’s a symmetric distribution. If the distribution tilts to the left or the right, that’s a skewed distribution. You’ll see two examples of skewed distributions onscreen. FREE ACCESS
-
7m 9sIn this video, you’ll watch a demo. Now, you’ll split your dataset into training and test data to build and evaluate your model. So that you can replicate your splits, you’ll set the seed to three. Once that's done, you’ll invoke the sample.split function to split the data. The SplitRatio onscreen is 80%. You'll use 80% of your data to train your model and 20% to evaluate your model. FREE ACCESS
-
6m 27sIn this video, you’ll watch a demo. You’ll learn about prediction. You’ll invoke the predict function, pass in the logistic.model, specify the test.data, and type is equal to "response". This will give you the prediction.probabilities you’ll use to compute the prediction labels. First, you’ll look at the prediction.probabilities onscreen. FREE ACCESS
-
10m 35sIn this video, you’ll watch a demo. You’ll learn resampling techniques for imbalanced data in R. You’ll see your current model is not able to identify what you need. You’ll fix this by increasing the number of records you have and resampling your original data. This allows you to artificially increase the number of records you have. You’ll get a larger sample by resampling the data you have to work with, with replacement. FREE ACCESS
-
8m 1sIn this video, you’ll learn to recognize decision tree models. The logistic regression algorithm you know fits an S-curve on your data. This S-curve is used to compute the probabilities of the outcome variable. You’ll learn different machine learning algorithms use different techniques to predict outcomes. The logistic regression algorithm used the S-curve. Now, you’ll see how to perform classification using decision trees. FREE ACCESS
-
6m 56sIn this video, you’ll learn how to use R to explore and process data. First, you’ll invoke ggplot to plot a bar plot for the different styles of wine. Onscreen, you’ll see you have red wine and white wine. You’ll build a classification model to use the attributes of the wine to predict the quality of the wine. Onscreen, there are seven categories for quality ranging from 3 all the way through to 9. FREE ACCESS
-
11m 11sIn this video, you’ll train your decision tree.model. You’ll learn training the decision tree.model involves invoking the rpart function. rpart refers to recursive partitioning and regression trees, which is a certain type of decision tree. This fits a tree.model on your data. You’ll use this model to predict the quality and all remaining variables are predictors. That is shown by the formula onscreen. The data you’ll use to train this model is our training.data. FREE ACCESS
-
2m 25sIn this video, you’ll summarize what you’ve learned in this course. You’ve learned classification models used to predict output categories of data points. You used classification models to classify records into classes or categories. You also saw how classification models can be evaluated using metrics such as accuracy, precision, and recall. You also performed classification on a real-world dataset using a logistic regression model. You learned linear regression fits a straight line on our data. FREE ACCESS
EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE
Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.
Digital badges are yours to keep, forever.