Building Features from Numeric Data
This course exhaustively covers data preprocessing techniques and transforms available in scikit-learn, allowing the construction of highly optimized features that are scaled, normalized and transformed in mathematically sound ways to fully harness the power of machine learning techniques.
What you'll learn
The quality of preprocessing that numeric data is subjected to is an important determinant of the results of machine learning models built using that data. With smart, optimized data pre-processing, you can significantly speed up model training and validation, saving both time and money, as well as greatly improve model performance in prediction.
In this course, Building Features from Numeric Data, you will gain the ability to design and implement effective, mathematically sound data pre-processing pipelines.
First, you will learn the importance of normalization, standardization and scaling, and understand the intuition and mechanics of tweaking the central tendency as well as dispersion of a data feature.
Next, you will discover how to identify and deal with outliers and possibly anomalous data. You will then learn important techniques for scaling and normalization. Such techniques, notably normalization using the L1-norm, L2-norm and Max norm, seek to transform feature vectors to have uniform magnitude. Such techniques find wide usage in ML model building - for instance in computing the cosine similarity of document vectors, and in transforming images before techniques such as convolutional neural networks are applied to them.
You will then move from normalization and standardization to scaling and transforming data. Such transformations include quantization as well as the construction of custom transformers for bespoke use cases. Finally, you will explore how to implement log and power transformations. You will round out the course by comparing the results of three important transformations - the Yeo-Johnson transform, the Box-Cox transform and the quantile transformation - in converting data with non-normal characteristics, such as chi-squared or lognormal data into the familiar bell curve shape that many models work best with.
When you’re finished with this course, you will have the skills and knowledge of data preprocessing and transformation needed to get the best out of your machine learning models.
Table of contents
- Version Check 0m
- Module Overview 1m
- Prerequisites and Course Outline 2m
- Scaling and Standardization 4m
- Mean, Variance, and Standard Deviation 4m
- Understanding Variance 4m
- Demo: Calculating Mean, Variance, and Standard Deviation 7m
- Demo: Box Plot Visualization and Data Standardization 7m
- Standard Scaler 4m
- Demo: Standardize Data Using the Scale Function 5m
- Demo: Standardize Data Using the Standard Scalar Estimator and Apply Bessels Correction 4m
- Robust Scaler 4m
- Demo: Scaling Data Using the Robust Scaler 7m
- Summary 1m
- Module Overview 1m
- What Is Normalization? 2m
- Normalization and Cosine Similarity 8m
- Demo: Cosine Similarity and the L2 Norm 7m
- Demo: Normalizing Data to Simplify Cosine Similarity Calculations 4m
- Demo: K-means Clustering with Cosine Similarity 4m
- L1, L2 and Max Norms 3m
- Demo: Normalization Using L1, L2 and Max Norms 5m
- Summary 1m
- Module Overview 1m
- Converting Continuous Data to Categorical 3m
- Demo: Convert Numeric Data to Binary Categories Using a Binarizer 5m
- Demo: Using the KBinsDiscretizer to Categorize Numeric Values 6m
- Demo: Using Bin Values to Flag Outliers 3m
- Scaling Data 2m
- Demo: Scaling with the MaxAbsScaler 2m
- Demo: Scaling with the MinMaxScaler 3m
- Custom Transformations 1m
- Demo: Performing Custom Transforms Using the FunctionTransformer 3m
- Generating Polynomial Features 2m
- Demo: Using Polynomial Features to Transform Data 6m
- Transforming Features to Gaussian-like Distributions Using Power Transformers 1m
- Demo: Working with Chi Squared Distributed Input Features 5m
- Demo: Applying Power Transformers to Get Normal Distributions 4m
- Transforming Data to Normal or Uniform Distributions Using Quantile Transformers 1m
- Demo: Tranforming to a Normal Distribution Using the QuantileTransformer 4m
- Summary and Further Study 2m