Fundamentals of NLP: Preprocessing Text Using NLTK & SpaCy

Natural Language Processing    |    Intermediate
  • 13 videos | 1h 56m 47s
  • Includes Assessment
  • Earns a Badge
Rating 5.0 of 1 users Rating 5.0 of 1 users (1)
Tokenization, stemming, and lemmatization are essential natural language processing (NLP) tasks. Tokenization involves breaking text into units (tokens), such as words or phrases, facilitating analysis. Stemming reduces words to a common base form by removing prefixes or suffixes, promoting simplicity in representation. In contrast, lemmatization considers grammatical aspects to transform words into their base or dictionary form. You will begin this course by tokenizing text using the Natural Language Toolkit (NLTK) and SpaCy, which involves splitting a large block of text into smaller units called tokens, usually words or sentences. You will then remove stopwords, common words such as "a" and "the" that add little meaning to text. Next, you'll explore the WordNet lexical database, which contains information about the semantic relationship between words. You'll use Synsets to view similar words and explore hypernyms, hyponyms, meronyms and holonyms. Finally, you'll compare stemming and lemmatization using NLTK and SpaCy. You will explore both processes with NLTK and perform lemmatization using SpaCy.

WHAT YOU WILL LEARN

  • Discover the key concepts covered in this course
    Perform tokenization with nltk
    Perform tokenization with spacy
    Remove stopwords using nltk
    Remove stopwords using spacy
    Explore wordnet synsets
    Compute similarity of words
  • Explore types of words in wordnet
    Perform stemming with nltk
    Perform lemmatization with nltk
    Perform lemmatization with spacy
    Perform parts-of-speech (pos) tagging and named entity recognition (ner)
    Summarize the key concepts covered in this course

IN THIS COURSE

  • 2m 16s
    In this video, we will discover the key concepts covered in this course. FREE ACCESS
  • 12m 49s
    Find out how to perform tokenization with NLTK. FREE ACCESS
  • Locked
    3.  Implementing Word and Sentence Tokenization Using SpaCy
    12m 56s
    In this video, you will learn how to perform tokenization with SpaCy. FREE ACCESS
  • Locked
    4.  Performing Stop Word Removal Using NLTK
    10m 45s
    During this video, you will discover how to remove stopwords using NLTK. FREE ACCESS
  • Locked
    5.  Performing Stopword Removal Using SpaCy
    5m 37s
    Learn how to remove stopwords using SpaCy. FREE ACCESS
  • Locked
    6.  Understanding WordNet Synsets
    5m 55s
    In this video, we will explore WordNet synsets. FREE ACCESS
  • Locked
    7.  Computing Word Similarity Using WordNet
    10m 18s
    Discover how to compute similarity of words. FREE ACCESS
  • Locked
    8.  Understanding Hypernyms, Hyponyms, Antonyms, Meronyms, and Holonyms
    12m 26s
    In this video, find out how to explore types of words in WordNet. FREE ACCESS
  • Locked
    9.  Performing Stemming Using NLTK
    9m 51s
    Learn how to perform stemming with NLTK. FREE ACCESS
  • Locked
    10.  Performing Lemmatization Using NLTK
    8m 21s
    In this video, you will discover how to perform lemmatization with NLTK. FREE ACCESS
  • Locked
    11.  Performing Lemmatization Using SpaCy
    12m 18s
    Find out how to perform lemmatization with SpaCy. FREE ACCESS
  • Locked
    12.  Performing Parts of Speech Tagging and Named Entity Recognition
    9m 53s
    Discover how to perform parts-of-speech (POS) tagging and named entity recognition (NER). FREE ACCESS
  • Locked
    13.  Course Summary
    3m 23s
    In this video, we will summarize the key concepts covered in this course. FREE ACCESS

EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE

Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.

Digital badges are yours to keep, forever.

YOU MIGHT ALSO LIKE

Rating 4.7 of 11 users Rating 4.7 of 11 users (11)
Rating 2.0 of 1 users Rating 2.0 of 1 users (1)