Fundamentals of NLP: Representing Text as Numeric Features
Natural Language Processing
| Intermediate
- 15 videos | 2h 17s
- Includes Assessment
- Earns a Badge
When performing sentiment classification using machine learning, it is necessary to encode text into a numeric format because machine learning models can only parse numbers, not text. There are a number of encoding techniques for text data, such as one-hot encoding, count vector encoding, and word embeddings. In this course, you will learn how to use one-hot encoding, a simple technique that builds a vocabulary from all words in your text corpus. Next, you will move on to count vector encoding, which tracks word frequency in each document and explore term frequency-inverse document frequency (TF-IDF) encoding, which also creates vocabularies and document vectors but uses a TF-IDF score to represent words. Finally, you will perform sentiment analysis using encoded text. You will use a count vector to encode your input data and then set up a Gaussian Naïve-Bayes model. You will train the model and evaluate its metrics. You will also explore how to improve the model performance by stemming words, removing stopwords, and using N-grams.
WHAT YOU WILL LEARN
-
Discover the key concepts covered in this courseUnderstand one-hot encoding representationsPerform one-hot encoding on textUse the countvectorizer object for one-hot encodingOutline how to encode text based on frequenciesEncode text as count vectorsExplore bag-of-words and bag-of-ngrams encodingEncode data using term frequency–inverse document frequency (tf-idf) scores
-
Explore and analyze dataCreate a naive-bayes model for sentiment analysisStem words and remove stopwords for machine learningFilter words based on frequency for classificationTrain classification models on n-gramsTrain models on tf-idf encodingsSummarize the key concepts covered in this course
IN THIS COURSE
-
2m 8sIn this video, we will discover the key concepts covered in this course. FREE ACCESS
-
7m 26sAfter completing this video, you will be able to understand one-hot encoding representations. FREE ACCESS
-
10m 5sFind out how to perform one-hot encoding on text. FREE ACCESS
-
7m 6sDiscover how to use the CountVectorizer object for one-hot encoding. FREE ACCESS
-
5m 47sUpon completion of this video, you will be able to outline how to encode text based on frequencies. FREE ACCESS
-
4m 56sIn this video, you will learn how to encode text as count vectors. FREE ACCESS
-
12m 2sIn this video, we will explore bag-of-words and bag-of-n-grams encoding. FREE ACCESS
-
11m 37sIn this video, find out how to encode data using term frequency–inverse document frequency (TF-IDF) scores. FREE ACCESS
-
11m 28sLearn how to explore and analyze data. FREE ACCESS
-
10mDuring this video, you will discover how to create a Naive-Bayes model for sentiment analysis. FREE ACCESS
-
8m 40sDiscover how to stem words and remove stopwords for machine learning. FREE ACCESS
-
8m 38sFind out how to filter words based on frequency for classification. FREE ACCESS
-
10m 14sLearn how to train classification models on n-grams. FREE ACCESS
-
7m 16sIn this video, you will learn how to train models on TF-IDF encodings. FREE ACCESS
-
2m 54sIn this video, we will summarize the key concepts covered in this course. FREE ACCESS
EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE
Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.
Digital badges are yours to keep, forever.