Mining Data from Text
This course discusses text and document feature vectors that can be passed into machine learning models, topic modeling using Latent Semantic Analysis, Latent Dirichlet Allocation, Non-negative Matrix Factorization, and keyword extraction using RAKE.
What you'll learn
A large part of the appeal of deep learning models is their ability to work with unstructured data types such as text, images, and video. However such models are only as good as the feature vectors that they operate on.
In this course, Mining Data from Text, you will gain the ability to build highly optimized and efficient feature vectors from textual and document data. First, you will learn how to represent documents as numeric data using simple numeric identifiers for individual words as well as more elegant methods such as term frequency and inverse document frequency. Next, you will discover how to perform topic modeling using techniques such as latent semantic analysis, latent Dirichlet allocation, and non-negative matrix factorization. Finally, you will explore how to implement keyword extraction using a popular algorithm - RAKE. When you’re finished with this course, you will have the skills and knowledge to move on to build efficient and optimized feature vectors from a large document corpus and use those feature vectors in building powerful machine learning models.
Table of contents
- Version Check 0m
- Module Overview 1m
- Prerequisites and Course Outline 1m
- Mining Data from Text 2m
- Numeric Representations of Text: One Hot Encoding 4m
- Numeric Representations of Text: Frequency Based Encodings 5m
- Numeric Representations of Text: Prediction Based Embeddings 3m
- Feature Hashing 3m
- Bag of Words: Bag of N Grams 2m
- Install and Setup 2m
- Frequency Based Representation Using Bag of Words and Bag of N Grams Model 7m
- Representing Documents Using TFIDF Scores and Feature Hashes 6m
- Module Summary 2m
- Module Overview 1m
- Latent Dirichlet Allocation: Topic Modeling with the Newspaper Headlines Dataset 6m
- Visualizing Topic Assignments Using Manifold Learning to Reduce Dimensions 5m
- Latent Dirichlet Allocation: Topic Modeling with the DBPedia Dataset 7m
- Visualizing Topics Using Manifold Learning to Reduce Dimensions 7m
- Interactive Topic Model Visualization Using PyLDAVis 3m
- Non-negative Matrix Factorization: Topic Modeling with the DBPedia Dataset 4m
- Interactive Topic Visualization Using Bokeh 4m
- Latent Semantic Indexing: Preprocessing Text 5m
- Concept Modeling Using LSI 5m
- Module Summary 1m