Handling Streaming Data with GCP Dataflow
Dataflow is a serverless, fully-managed service on the Google Cloud Platform for batch and stream processing.
What you'll learn
Dataflow allows developers to process and transform data using easy, intuitive APIs. Dataflow is built on the Apache Beam architecture and unifies batch as well as stream processing of data. In this course, Handling Streaming Data with GCP Dataflow, you will discover the GCP provides a wide range of connectors to integrate the Dataflow service with other GCP services such as the Pub/Sub messaging service and the BigQuery data warehouse.
First, you will see how you can integrate your Dataflow pipelines with other services to use as a source of streaming data or as a sink for your final results.
Next, you will stream live Twitter feeds to the Pub/Sub messaging service and implement your pipeline to read and process these Twitter messages. Finally, you will implement pipelines with a side input, and branching pipelines to write your final results to multiple sinks. When you are finished with this course you will have the skills and knowledge to design complex Dataflow pipelines, integrate these pipelines with other Google services, and test and run these pipelines on the Google Cloud Platform.
Table of contents
- Version Check 0m
- Prerequisites and Course Outline 4m
- Demo: Enabling APIs on the GCP 3m
- Demo: Creating a Service Account 5m
- Demo: Creating an Apache Maven Project 5m
- Demo: Uploading Data to a Cloud Storage Bucket 3m
- Demo: Implementing a Dataflow Pipeline for Batch Data 5m
- Demo: Executing a Dataflow Pipeline 5m
- Demo: Viewing Final Results 1m
- Demo: Custom Pipeline Options 6m
- Demo: Implementing a Pipeline to Read from Pub/Sub 6m
- Demo: Executing a Streaming Dataflow Pipeline 5m
- Demo: Debugging Slow Pipelines 6m
- Quick Overview of Messaging in Pub/Sub 4m
- Demo: Pipeline to Read from Pub/Sub and Write to BigQuery 5m
- Demo: Writing Streaming Results to BigQuery 4m
- Demo: Creating and Accessing Twitter API Keys 5m
- Demo: Connecting to the Twitter API and Publishing Tweets to Pub/Sub 6m
- Demo: Extracting Hashtags from Tweets 4m
- Demo: Viewing Extracted Hashtags Using a Pub/Sub Subscription 3m
- Demo: Executing Pipelines with Side Inputs 7m
- Demo: Implementing a Branching Pipeline 6m
- Demo: Viewing Results in Pub/Sub and BigQuery 3m
- Demo: Updating a Running Pipeline 5m
- Quick Overview of Windowing and Triggers 7m
- Demo: Extracting Event Time 6m
- Demo: Reading Out-of-order Data from Pub/Sub 2m
- Demo: Associating Event Time Using Pub/Sub Message Attributes 5m
- Demo: Performing Sliding Window Operations on Input Streams 6m
- Demo: Performing Session Window Operations on Input Streams 4m
- Demo: Configuring Triggers for Late Arriving Data 6m
- Demo: Configuring Triggers to Get Early Results from a Long Window 5m
- Demo: Configuring Data Driven Triggers 4m
- Demo: Implementing a Pipeline to Write Data to a Parquet File 5m
- Demo: Viewing Results Written to a Parquet File 4m
- Demo: Implementing Join Operations in a Dataflow Pipeline 9m
- Demo: Viewing Joined Results in a Cloud Storage Bucket 4m
- Demo: Implementing Counter Metrics 3m
- Demo: Visualizing Counter Metrics Using Cloud Monitoring 5m
- Demo: Unit Testing Dataflow Pipelines 6m
- Demo: End to End Pipeline Testing 3m
- Summary and Further Study 2m