Production Machine Learning Systems
This course covers how to implement the various flavors of production ML systems— static, dynamic, and continuous training; static and dynamic inference; and batch and online processing. You delve into TensorFlow abstraction levels, the various options for doing distributed training, and how to write distributed training models with custom estimators.
What you'll learn
This course covers how to implement the various flavors of production ML systems— static, dynamic, and continuous training; static and dynamic inference; and batch and online processing. You delve into TensorFlow abstraction levels, the various options for doing distributed training, and how to write distributed training models with custom estimators. This is the second course of the Advanced Machine Learning on Google Cloud series. After completing this course, enroll in the Image Understanding with TensorFlow on Google Cloud course.
Table of contents
- Architecting ML systems 2m
- Data extraction, analysis, and preparation 5m
- Model training, evaluation, and validation 2m
- Trained model, prediction service, and performance monitoring 3m
- Training design decisions 5m
- Serving design decisions 6m
- Designing from scratch 3m
- Using Vertex AI 9m
- Lab introduction: Structured data prediction 0m
- Lab: Structured data prediction using Vertex AI Platform 0m
- Readings: Architecting production ML systems 0m
- Introduction 3m
- Adapting to data 3m
- Changing distributions 4m
- Lab: Adapting to data 2m
- Right and wrong decisions 4m
- System failure 2m
- Concept drift 9m
- Actions to mitigate concept drift 3m
- TensorFlow data validation 4m
- Components of TensorFlow data validation 5m
- Lab Introduction: Introduction to TensorFlow Data Validation 0m
- Lab: Introduction to TensorFlow Data Validation 0m
- Lab Introduction: Advanced Visualizations with TensorFlow Data Validation 1m
- Lab: Advanced Visualizations with TensorFlow Data Validation 0m
- Mitigating training-serving skew through design 2m
- Lab: Vertex AI: Training and Serving a Custom Model 0m
- Diagnosing a production model 4m
- Readings: Designing adaptable ML systems 0m
- Introduction 1m
- Training 6m
- Predictions 3m
- Why distributed training is needed 3m
- Distributed training architectures 8m
- TensorFlow distributed training strategies 1m
- Mirrored strategy 3m
- Multi-worker mirrored strategy 4m
- TPU strategy 2m
- Parameter server strategy 2m
- Lab Introduction: Distributed Training with Keras 0m
- Lab: Distributed Training with Keras 0m
- Training on large datasets with tf.data API 5m
- Lab Introduction: TPU-speed Data Pipelines 0m
- Lab: TPU Speed Data Pipelines 0m
- Inference 4m
- Readings: Designing high-performance ML systems 0m