Conceptualizing the Processing Model for the GCP Dataflow Service
Dataflow represents a fundamentally different approach to Big Data processing than computing engines such as Spark. Dataflow is serverless and fully-managed, and supports running pipelines designed using Apache Beam APIs.
What you'll learn
Dataflow allows developers to process and transform data using easy, intuitive APIs. Dataflow is built on the Apache Beam architecture and unifies batch as well as stream processing of data. In this course, Conceptualizing the Processing Model for the GCP Dataflow Service, you will be exposed to the full potential of Cloud Dataflow and its innovative programming model.
First, you will work with an example Apache Beam pipeline performing stream processing operations and see how it can be executed using the Cloud Dataflow runner.
Next, you will understand the basic optimizations that Dataflow applies to your execution graph such as fusion and combine optimizations.
Finally, you will explore Dataflow pipelines without writing any code at all using built-in templates. You will also see how you can create a custom template to execute your own processing jobs.
When you are finished with this course, you will have the skills and knowledge to design Dataflow pipelines using Apache Beam SDKs, integrate these pipelines with other Google services, and run these pipelines on the Google Cloud Platform.
Table of contents
- Version Check 0m
- Prerequisites and Course Outline 3m
- Overview of Apache Beam 4m
- Introducing Cloud Dataflow 5m
- Executing Pipelines on Dataflow 6m
- Demo: Enabling APIs 3m
- Demo: Setting up a Service Account 5m
- Demo: Sample Word Count Application 7m
- Demo: Executing the Word Count Application on the Beam Runner 2m
- Demo: Creating Cloud Storage Buckets 3m
- Demo: Implementing a Beam Pipeline to Run on Dataflow 4m
- Demo: Running a Beam Pipeline on Cloud Dataflow 4m
- Demo: Custom Pipeline Options 4m
- Dataflow Pricing 4m
- Monitoring Jobs 4m
- Demo: Implementing a Pipeline with a Side Input 7m
- Demo: Running the Code and Exploring the Job Graph 5m
- Demo: Exploring Job Metrics 3m
- Demo: Autoscaling 4m
- Demo: Enabling the Streaming Engine 2m
- Demo: Using the Command-line Interface to Monitor Jobs 4m
- Demo: Logging Messages in Dataflow 4m
- Demo: Tracking Dataflow Metrics with the Metrics Explorer 4m
- Demo: Configuring Alerts 4m
- Structuring User Code 3m
- Demo: Writing Pipeline Results to Pub/Sub 7m
- Demo: Viewing Pipeline Results in Pub/Sub 2m
- Demo: Writing Pipeline Results to BigQuery 5m
- Demo: Viewing Pipeline Results in BigQuery 2m
- Demo: Performing Join Operations 7m
- Demo: Errors and Retries in Dataflow 6m
- Fusion and Combine Optimizations 6m
- Autoscaling and Dynamic Work Rebalancing 3m
- Demo: Reading Streaming Data from Pub/Sub 8m
- Demo: Writing Streaming Data to BigQuery 7m