Building Features from Text Data in Microsoft Azure
This course covers aspects of building text features for machine learning using Azure Machine Learning Service virtual machines, including tokenization, stopword removal, feature vectorization, and more from natural language processing.
What you'll learn
Using text data to make decisions is key in creating text features for machine learning models. In this course, Building Features from Text Data in Microsoft Azure, you'll obtain the ability to structure your data several ways that are usable in machine learning models using Microsoft Azure Machine Learning Service virtual machines. First, you’ll discover how to use natural language processing to prepare text data, and how to leverage several natural language processing technologies, such as document tokenization, stopword removal, frequency filtering, stemming and lemmatization, parts-of-speech tagging, and n-gram identification. Then, you’ll explore documents as text features, where you'll learn to represent documents as feature vectors by using techniques including one-hot and count vector encodings, frequency based encodings, word embeddings, hashing, and locality-sensitive hashing. Finally, you'll delve into using BERT to generate word embeddings. By the end of this course, you'll have the skills and knowledge to use textual data and Microsoft Azure in conceptually sound ways to create text features for machine learning models.
Table of contents
- Module Overview 2m
- Prerequisites 3m
- Demo: Configure AMLS 4m
- Preprocessing and NLP 2m
- Tokenization and Cleaning 4m
- Demo: Sentence and Word Tokenization 4m
- Demo: NLTK Tokenizers 3m
- Demo: Token Cleaning 2m
- Stopword Removal 1m
- Demo: Stopword Removal 4m
- Frequency Filtering 1m
- Demo: Frequency Filtering 5m
- Stemming 2m
- Demo: Stemming 3m
- Parts-of-speech Tagging 1m
- Demo: Parts-of-speech 2m
- Lemmatization 2m
- Demo: Lemmatization 2m
- N-grams 2m
- Demo: N-grams 3m
- Module Summary 2m
- Module Overview 2m
- Encoding Text as Numbers 4m
- One-hot and Count Vector Encoding 5m
- Demo: Bag-of-words 4m
- Demo: Bag-of-n-grams 2m
- TF-IDF Encoding 2m
- Demo: TF-IDF Encoding 4m
- Word Embeddings 3m
- Demo: Word Embeddings Using Word2Vec 4m
- Feature Hashing 4m
- Demo: The Hashing Trick 5m
- Locality-sensitive Hashing 3m
- Demo: Locality-sensitive Hashing 5m
- BERT 4m
- Demo: Word Embeddings with BERT on AMLS 7m
- Summary 2m