Preparing Data for Modeling with scikit-learn
This course covers important steps in the pre-processing of data, including standardization, normalization, novelty and outlier detection, pre-processing image and text data, as well as explicit kernel approximations such as the RBF and Nystroem methods.
What you'll learn
Even as the number of machine learning frameworks and libraries increases on a daily basis, scikit-learn is retaining its popularity with ease. Scikit-learn makes the common use-cases in machine learning - clustering, classification, dimensionality reduction and regression - incredibly easy. In this course, Preparing Data for Modeling with scikit-learn, you will gain the ability to appropriately pre-process data, identify outliers and apply kernel approximations. First, you will learn how pre-processing techniques such as standardization and scaling help improve the efficacy of ML algorithms. Next, you will discover how novelty and outlier detection is implemented in scikit-learn. Then, you will understand the typical set of steps needed to work with both text and image data in scikit-learn. Finally, you will round out your knowledge by applying implicit and explicit kernel transformations to transform data into higher dimensions. When you’re finished with this course, you will have the skills and knowledge to identify the correct data pre-processing technique for your use-case and detect outliers using theoretically robust techniques.
Table of contents
- Version Check 0m
- Module Overview 1m
- Prerequisites and Course Outline 2m
- Scaling and Standardization 5m
- Normalization 3m
- Transforming Data to Gaussian Distributions 2m
- Calculating and Visualizing Summary Statistics 5m
- Using the Standard Scaler for Standardizing Numeric Features 6m
- Using the Robust Scaler to Scale Numeric Features 4m
- Normalization and Cosine Similarity 6m
- Transforming Bimodally Distributed Data to a Normal Distribution Using a Quantile Transformer 5m
- Reducing Dimensionality Using Factor Analysis 6m
- Module Summary 1m
- Module Overview 1m
- Outliers and Novelties 3m
- Detecting and Coping with Outlier Data 4m
- Local Outlier Factor 3m
- Elliptic Envelope 3m
- Isolation Forest 4m
- Outlier Detection Using Local Outlier Factor 7m
- Outlier Detection Using Isolation Forest 5m
- Outlier Detection Using Elliptic Envelope 3m
- Novelty Detection Using Local Outlier Factor 5m
- Using the Predict Score Samples and Decision Function 3m
- Outlier Detection Using the Head Brain Dataset 4m
- Module Summary 1m
- Module Overview 1m
- Representing Text Data in Numeric Form 5m
- Bag-of-words and Bag-of-n-grams Models 3m
- Vectorize Text Using the Bag-of-words Model 5m
- Vectorize Text Using the Bag-of-n-grams Model 3m
- Vectorize Text Using Tf-Idf Scores 3m
- Hashing for Dimensionality Reduction 3m
- Reducing Dimensions Using the Hashing Vectorizer 3m
- Performing Feature Extraction on a Python Dictionary 2m
- Module Summary 1m
- Module Overview 1m
- Representing Images as Matrices 3m
- Feature Extraction from Images 6m
- Extracting Patches from Image Data 4m
- Using Dictionary Learning to Denoise and Reconstruct Images 7m
- Clustering Image Data Using a Pixel Connectivity Graph 7m
- Clustering Images Using a Gradient Connectivity Graph 6m
- Module Summary 1m