Building Features from Nominal Data
This course covers various techniques for encoding categorical data, starting with the familiar forms of one-hot and label encoding, before moving to contrast coding schemes such as simple coding, Helmert coding, and orthogonal polynomial coding.
What you'll learn
The quality of preprocessing the numeric data is subjected to the important determinant of the results of machine learning models built using that data. In this course, Building Features from Nominal Data, you will gain the ability to encode categorical data in ways that increase the statistical power of models. First, you will learn the different types of continuous and categorical data, and the differences between ratio and interval scale data, and between nominal and ordinal data. Next, you will discover how to encode categorical data using one-hot and label encoding, and how to avoid the dummy variable trap in linear regression. Finally, you will explore how to implement different forms of contrast coding - such as simple, Helmert, and orthogonal polynomial coding, so that regression results closely mirror the hypotheses that you wish to test. When you’re finished with this course, you will have the skills and knowledge of encoding categorical data needed to increase the statistical power of linear regression that includes such data.
Table of contents
- Version Check 0m
- Module Overview 1m
- Prerequisites and Course Outline 2m
- Continuous and Categorical Data 4m
- Numeric Data 5m
- Categorical Data 4m
- Label Encoding and One-hot Encoding 4m
- Choosing between Label Encoding and One-hot Encoding 4m
- Types of Classification Tasks 5m
- One-hot Encoding with Known and Unknown Categories 5m
- One-hot Encoding on a Pandas Data Frame Column 2m
- One-hot Encoding Using pd.get_dummies() 1m
- Label Encoding to Convert Categorical Data to Ordinal 6m
- Label Binarizer to Perform One vs. Rest Encoding of Targets 4m
- Multilabel Binarizer for Encoding Multilabel Targets 2m
- Module Summary 1m
- Module Overview 2m
- The Dummy Trap 5m
- Avoiding the Dummy Trap 5m
- Dummy Coding to Overcome Limitations of One-hot Encoding 7m
- Regression Analysis with Dummy or Treatment Coding 6m
- Dummy Coding Using Patsy 6m
- Perform Regression Analysis Using Machine Learning on Dummy Coded Categories 4m
- Performing Linear Regression Using Machine Learning with One-hot Encoded Categories 3m
- Module Summary 1m
- Module Overview 1m
- Dummy Coding vs. Contrast Coding 4m
- Exploring Contrast Coding Techniques 4m
- Regression Analysis Using Simple Effect Coding 6m
- Performing Linear Regression Using Machine Learning with Simple Effect Coding 6m
- Regression Using Backward Difference Encoding 7m
- Regression Using Helmert Encoding 8m
- Generating Equally Spaced Categories to Perform Orthogonal Polynomial Encoding 5m
- Performing Regression Analysis Using Orthogonal Polynomial Encoding 2m
- Module Summary 1m
- Module Overview 2m
- Bucketing Continuous Data 3m
- Bucketing Continuous Data Using Pandas 3m
- Categorizing Continuous Data Using the KBinsDiscretizer 6m
- Hashing 3m
- Feature Hashing with Dictionaries, Tuples, and Text Data 3m
- Building a Simple Regression Model Using Hashed Categorical Values 3m
- Summary and Further Study 1m