- Lab
- A Cloud Guru
Setting Up a Data Streaming Pipeline with Dataflow
One of the primary benefits of Dataflow is that it can handle both streaming and batch data processing in a serverless, fast, and cost-effective manner. In this hands-on lab, you’ll establish the necessary infrastructure — including a Cloud Storage bucket, a Pub/Sub topic, and a BiqQuery dataset — to execute a Dataflow template on real-time streaming data from New York City’s ever-busy taxi service.
Path Info
Table of Contents
-
Challenge
Enable the Necessary APIs
Enable the Dataflow, Google Cloud Storage JSON, BigQuery, Cloud Pub/Sub, and Cloud Resource Manager APIs, either through the user interface or the Cloud Shell.
-
Challenge
Create a Storage Bucket
Create a Cloud Storage bucket to hold the temporary Dataflow data.
-
Challenge
Create a Dataset and Table
Create a BigQuery dataset and table with the proper schema to hold the dataset-generated data.
-
Challenge
Run a Pub/Sub to BigQuery Dataflow Job
Use the Pub/Sub to BigQuery Dataflow template to process the data.
-
Challenge
Query the Resulting Dataset
Input and run the desired queries.
What's a lab?
Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.
Provided environment for hands-on practice
We will provide the credentials and environment necessary for you to practice right within your browser.
Guided walkthrough
Follow along with the author’s guided walkthrough and build something new in your provided environment!
Did you know?
On average, you retain 75% more of your learning if you get time for practice.