Scaling scikit-learn Solutions
This course covers the important considerations for scikit-learn models in improving prediction latency and throughput; specific feature representation and partial learning techniques, as well as implementations of incremental learning, out-of-core learning, and multicore parallelism.
What you'll learn
Even as the number of machine learning frameworks and libraries increases rapidly, scikit-learn is retaining its popularity with ease. scikit-learn makes the common use-cases in machine learning - clustering, classification, dimensionality reduction and regression - incredibly easy.
In this course, Scaling scikit-learn Solutions you will gain the ability to leverage out-of-core learning and multicore parallelism in scikit-learn.
First, you will learn considerations that affect latency and throughput in prediction, including the number of features, feature complexity, and model complexity.
Next, you will discover how smart choices in feature representation and in how you model sparse data can improve the scalability of your models. You will then understand what incremental learning is, and how to use scikit-learn estimators that support this key enabler of out-of-core learning.
Finally, you will round out your knowledge by parallelizing key tasks such as cross-validation, hyperparameter tuning, and ensemble learning.
When you’re finished with this course, you will have the skills and knowledge to identify key techniques to help make your model scalable and implement them appropriately for your use-case.
Table of contents
- Version Check 0m
- Module Overview 1m
- Prerequisites and Course Outline 2m
- Dimensions of Scaling 2m
- Measuring Performance in Scaling 6m
- Influence of Number of Features 5m
- Influence of Feature Extraction Techniques 5m
- Influence of Feature Representation 3m
- Demo: Helper Functions to Generate Datasets and Train Models 5m
- Demo: Measuring Training Latencies for Different Models 4m
- Module Summary 1m
- Module Overview 1m
- Demo: Measuring Bulk and Atomic Prediction Latencies for Different Models 7m
- Demo: Influence of Number of Features on Bulk Prediction Latency 5m
- Optimizations to Improve Prediction Latency 7m
- Optimizations to Improve Prediction Throughput 2m
- Demo: Observing the Influence of Model Complexity 8m
- Demo: Using Optimized Libraries and Reducing Validation Overhead 3m
- Demo: Training Models Using Dense and Sparse Input Representation 6m
- Demo: Prediction with Sparse Data and Memory Profiling 6m
- Module Summary 1m
- Module Overview 1m
- Streaming Data 4m
- Incremental Learning for Large Datasets 7m
- Demo: Preparing Text Data for out of Core Learning 6m
- Demo: Using Partial Fit to Perform out of Core Learning 5m
- Demo: Visualizing Latencies and Accuracies 5m
- Demo: Using the Passive Aggressive, Perceptron, and BernoulliNB Classifiers 4m
- Module Summary 1m
- Module Overview 1m
- Parallelizing Computation Using Joblib 5m
- Demo: Introducing Joblib 4m
- Demo: Running Concurrent Workers Using Joblib 5m
- Demo: Cross Validation Using Concurrent Workers 4m
- Demo: Integrating Joblib with Dask ML 3m
- Demo: Grid Search with Concurrent Workers 3m
- Demo: Preparing Data for Multi-label Classification 8m
- Demo: Performing Multi-label Classification 4m
- Module Summary 1m