Reducing Dimensions in Data with scikit-learn
This course covers a wide range of the important techniques of dimensionality reduction and feature selection available in scikit-learn, allowing model builders to optimize model performance by reducing overfitting, save on model training time and cost, and better visualize the results of machine learning models.
What you'll learn
Dimensionality Reduction is a powerful and versatile machine learning technique that can be used to improve the performance of virtually every ML model. Using dimensionality reduction, you can significantly speed up model training and validation, saving both time and money, as well as greatly reduce the risk of overfitting.
In this course, Reducing Dimensions in Data with scikit-learn, you will gain the ability to design and implement an exhaustive array of feature selection and dimensionality reduction techniques in scikit-learn.
First, you will learn the importance of dimensionality reduction, and understand the pitfalls of working with data of excessively high-dimensionality, often referred to as the curse of dimensionality.
Next, you will discover how to implement feature selection techniques to decide which subset of the existing features we might choose to use, while losing as little information from the original, full dataset as possible.
You will then learn important techniques for reducing dimensionality in linear data. Such techniques, notably Principal Components Analysis and Linear Discriminant Analysis, seek to re-orient the original data using new, optimized axes. The choice of these axes is driven by numeric procedures such as Eigenvalue and Singular Value Decomposition.
You will then move to dealing with manifold data, which is non-linear and often takes the form of swiss rolls and S-curves. Such data presents an illusion of complexity, but is actually easily simplified by unrolling the manifold. Finally, you will explore how to implement a wide variety of manifold learning techniques including multi-dimensional scaling (MDS), isomap, and t-distributed Stochastic Neighbor Embedding (t-SNE). You will round out the course by comparing the results of these manifold unrolling techniques with different datasets, including images of faces and handwritten data.
When you’re finished with this course, you will have the skills and knowledge of Dimensionality Reduction needed to design and implement ways to mitigate the curse of dimensionality in scikit-learn.
Table of contents
- Version Check 0m
- Module Overview 1m
- Prerequisites and Course Outline 2m
- The Curse of Dimensionality 6m
- Overfitted Models and Data Sparsity 4m
- Exploring Techniques for Reducing Dimensions 4m
- Demo: Exploring the Classification Dataset 6m
- Demo: Performing Classification with All Features 3m
- Demo: Exploring the Regression Dataset 5m
- Demo: Performing Kitchen Sink Regression Using ML and Non-ML Techniques 3m
- Feature Selection and Dictionary Learning 5m
- Demo: Using Univariate Linear Regression Tests to Select Features 6m
- Demo: Defining Helper Functions to Build and Train Multiple Models with Different Training Features 4m
- Demo: Finding the Best Value of K 4m
- Demo: Using Mutual Information to Select Features 3m
- Demo: Dictionary Learning to Find Sparse Representations of Data 8m
- Summary 1m
- Module Overview 1m
- The Intuition Behind Principal Components Analysis 7m
- Demo: Implementing Principal Component Analysis 7m
- Demo: Building Regression Models with Principal Components 3m
- Factor Analysis Using Singular Value Decomposition 2m
- Demo: Implementing Factor Analysis 7m
- Linear Discriminant Analysis for Dimensionality Reduction 3m
- Demo: Observing Class Seperation Boundaries on the Iris Dataset 6m
- Demo: Linear Discriminant Analysis for Classification 4m
- Summary 2m
- Module Overview 1m
- The Manifold Hypothesis and Manifold Learning 6m
- Demo: Generate S-curve Manifold and Setup Helper Functions 6m
- Demo: Metric and Non-metric Multi Dimensional Scaling 3m
- Demo: Manifold Learning Using Spectral Embedding TSNE and Isomap 4m
- Demo: Manifold Learning with Locally Linear Embedding 2m
- Demo: Preparing Images to Apply Manifold Learning for Dimensionality Reduction 4m
- Demo: Manifold Learning with Handwritten Digits 4m
- Demo: Preparing the Olivetti Faces Dataset for Manifold Learning 4m
- Demo: Manifold Learning on Olivetti Faces Dataset 3m
- Summary and Further Study 2m