Building Features from Text Data
This course covers aspects of extracting information from text documents and constructing classification models including feature vectorization, locality-sensitive hashing, stopword removal, lemmatization, and more from natural language processing.
What you'll learn
From chatbots to machine-generated literature, some of the hottest applications of ML and AI these days are for data in textual form.
In this course, Building Features from Text Data, you will gain the ability to structure textual data in a manner ideal for use in ML models.
First, you will learn how to represent documents as feature vectors using one-hot encoding, frequency-based, and prediction-based techniques. You will see how to improve these representations based on the meaning, or semantics, of the document.
Next, you will discover how to leverage various language modeling features such as stopword removal, frequency filtering, stemming and lemmatization, and parts-of-speech tagging.
Finally, you will see how locality-sensitive hashing can be used to reduce the dimensionality of documents while still keeping similar documents close together.
You will round out the course by implementing a classification model on text documents using many of these modeling abstractions.
When you’re finished with this course, you will have the skills and knowledge to use documents and textual data in conceptually and practically sound ways and represent such data for use in machine learning models.
Table of contents
- Version Check 0m
- Module Overview 1m
- Prerequisites and Course Outline 1m
- One-hot Encoding 4m
- Count Vectors 3m
- Tf-Idf Vectors 3m
- Co-occurence Vectors 5m
- Word Embeddings 5m
- Installing Packages and Setting Up the Environment 3m
- Sentence and Word Tokenization 5m
- Plotting Word Frequency Distributions 4m
- Module Summary 1m
- Module Overview 1m
- Naive Bayes for Classification 3m
- Classification Using the Hashing Vectorizer 8m
- Pre-process Text Using a Stemmer, Build Features Using the Hashing Vectorizer 3m
- Building Features Using the Count Vectorizer 2m
- Pre-processing with Stopword Removal, Building Features Using Count Vectorizer 2m
- Pre-processing with Stopword Removal, Frequency Filtering, Building Features Using Count Vectorizer 3m
- Building Features Using the Tf-Idf Vectorizer 2m
- Building Features Using Bag-of-n-grams Model 2m
- Summary and Further Study 2m