Building Features from Text Data

This course covers aspects of extracting information from text documents and constructing classification models including feature vectorization, locality-sensitive hashing, stopword removal, lemmatization, and more from natural language processing.

by Janani Ravi

Get started Preview course

What you'll learn

From chatbots to machine-generated literature, some of the hottest applications of ML and AI these days are for data in textual form.

In this course, Building Features from Text Data, you will gain the ability to structure textual data in a manner ideal for use in ML models.

First, you will learn how to represent documents as feature vectors using one-hot encoding, frequency-based, and prediction-based techniques. You will see how to improve these representations based on the meaning, or semantics, of the document.

Next, you will discover how to leverage various language modeling features such as stopword removal, frequency filtering, stemming and lemmatization, and parts-of-speech tagging.

Finally, you will see how locality-sensitive hashing can be used to reduce the dimensionality of documents while still keeping similar documents close together.

You will round out the course by implementing a classification model on text documents using many of these modeling abstractions.

When you’re finished with this course, you will have the skills and knowledge to use documents and textual data in conceptually and practically sound ways and represent such data for use in machine learning models.

Try this course for free

Access this course and other top-rated tech content with a free trial.

Free individual trial Free team trial

Have questions?

Get them answered now.

Start a live chat

Course Info

Rating

(25 reviews)

Level

Advanced

Last updated

Jun 28, 2019

Duration

2h 36m 23s

Course Overview | 1m 49s

About the author

Janani Ravi

A problem solver at heart, Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework.

More Courses by Janani

Building Features from Text Data

What you'll learn

Table of contents

Course Overview 1m 49s

Representing Text as Features for Machine Learning 37m 24s

Building Feature Vector Representations of Text 27m 42s

Simplifying Text Processing Using Natural Language Processing 33m 54s

Reducing Dimensions in Text Using Hashing 27m 39s

Applying Text Feature Extraction Techniques to Machine Learning 27m 54s

About the author