NLP with LLMs: Working with Tokenizers in Hugging Face
Large Language Models (LLMs)
| Intermediate
- 15 videos | 2h 18m 25s
- Includes Assessment
- Earns a Badge
Hugging Face, a leading company in the field of artificial intelligence (AI), offers a comprehensive platform that enables developers and researchers to build, train, and deploy state-of-the-art machine learning (ML) models with a strong emphasis on open collaboration and community-driven development. In this course, you will discover the extensive libraries and tools Hugging Face offers, including the Transformers library, which provides access to a vast array of pre-trained models and datasets. Next, you will set up your working environment in Google Colab. You will also explore the critical components of the text preprocessing pipeline: normalizers and pre-tokenizers. Finally, you will master various tokenization techniques, including byte pair encoding (BPE), Wordpiece, and Unigram tokenization, which are essential for working with transformer models. Through hands-on exercises, you will build and train BPE and WordPiece tokenizers, configuring normalizers and pre-tokenizers to fine-tune these tokenization methods.
WHAT YOU WILL LEARN
-
Discover the key concepts covered in this courseProvide an overview of the hugging face platformOutline how tokenization works for transformer modelsWork with the hugging face platformSet up a colab notebookExplore normalization and pre-tokenizationPerform byte pair encoding (bpe) and wordpiece tokenizationSet up a bpe tokenizer
-
Implement bpe tokenizationSet up a wordpiece tokenizerImplement wordpiece tokenizationTrain a bpe tokenizerPerform normalization and pre-tokenization with wordpieceTrain a wordpiece tokenizerSummarize the key concepts covered in this course
IN THIS COURSE
-
2m 10sIn this video, we will discover the key concepts covered in this course. FREE ACCESS
-
12m 5sAfter completing this video, you will be able to provide an overview of the Hugging Face platform. FREE ACCESS
-
11m 20sUpon completion of this video, you will be able to outline how tokenization works for transformer models. FREE ACCESS
-
11m 34sDiscover how to work with the Hugging Face platform. FREE ACCESS
-
5m 38sLearn how to set up a Colab notebook. FREE ACCESS
-
8m 2sIn this video, we will explore normalization and pre-tokenization. FREE ACCESS
-
10m 18sDuring this video, you will learn how to perform byte pair encoding (BPE) and WordPiece tokenization. FREE ACCESS
-
12m 2sFind out how to set up a BPE tokenizer. FREE ACCESS
-
11m 58sIn this video, discover how to implement BPE tokenization. FREE ACCESS
-
12m 12sLearn how to set up a WordPiece tokenizer. FREE ACCESS
-
9m 36sDiscover how to implement WordPiece tokenization. FREE ACCESS
-
11m 22sIn this video, find out how to train a BPE tokenizer. FREE ACCESS
-
7m 30sIn this video, you will learn how to perform normalization and pre-tokenization with WordPiece. FREE ACCESS
-
9m 31sDiscover how to train a WordPiece tokenizer. FREE ACCESS
-
3m 7sIn this video, we will summarize the key concepts covered in this course. FREE ACCESS
EARN A DIGITAL BADGE WHEN YOU COMPLETE THIS COURSE
Skillsoft is providing you the opportunity to earn a digital badge upon successful completion on some of our courses, which can be shared on any social network or business platform.
Digital badges are yours to keep, forever.