Architecting Data Warehousing Solutions Using Google BigQuery
BigQuery is the Google Cloud Platform’s data warehouse on the cloud. In this course, you’ll learn how you can work with BigQuery on huge datasets with little to no administrative overhead.
What you'll learn
Organizations store massive amounts of data that gets collated from a wide variety of sources. BigQuery supports fast querying at a petabyte scale, with serverless functionality and autoscaling. BigQuery also supports streaming data, works with visualization tools, and interacts seamlessly with Python scripts running from Datalab notebooks.
In this course, Architecting Data Warehousing Solutions Using Google BigQuery, you’ll learn how you can work with BigQuery on huge datasets with little to no administrative overhead related to cluster and node provisioning.
First, you'll start off with an overview of the suite of storage products on the Google Cloud and the unique position that BigQuery holds. You’ll see how BigQuery compares with Cloud SQL, BigTable, and Datastore on the GCP and how it differs from Amazon Redshift, the data warehouse on AWS.
Next, you’ll create datasets in BigQuery which are the equivalent of databases in RDMBSes and create tables within datasets where actual data is stored. You’ll work with BigQuery using the web console as well as the command line. You’ll load data into BigQuery tables using the CSV, JSON, and AVRO format and see how you can execute and manage jobs.
Finally, you'll wrap up by exploring advanced analytical queries which use nested and repeated fields. You’ll run aggregate operations on your data and use advanced windowing functions as well. You’ll programmatically access BigQuery using client libraries in Python and visualize your data using Data Studio.
At the end of this course, you'll be comfortable working with huge datasets stored in BigQuery, executing analytical queries, performing analysis, and building charts and graphs for your reports.
Table of contents
- Module Overview 2m
- Prerequisites and Course Outline 3m
- Transactional and Analytical Processing 5m
- Introducing BigQuery 5m
- Choosing BigQuery 6m
- BigQuery Pricing 5m
- Logging into the GCP and Enabling APIs 2m
- Beta UI and Classic UI 2m
- Public Datasets 4m
- Executing Queries 4m
- Working with the bq Command on the Terminal 4m
- Module Overview 1m
- Uploading Data to GCS Buckets 3m
- Importing Data from CSV Files on Cloud Storage 4m
- Configuring Additional Settings While Importing JSON Files 3m
- Creating External Tables with Data in GCS Buckets 4m
- Partitioning in BigQuery 4m
- Creating and Querying Ingestion Time Partitioned Tables 6m
- Creating Column Based Partitioned Tables Using the Command Line 4m
- Module Overview 1m
- Normalized Storage in a Traditional Database 3m
- Denormalized Storage, Nested, and Repeated Fields 4m
- The UNNEST, ARRAY_AGG, and the STRUCT Operators 3m
- Working with Nested Fields 3m
- Populating Data into a Table with Nested Fields Using STRUCT 4m
- Working with Repeated Fields 3m
- Populating Tables with Repeated Fields Using ARRAY_AGG 3m
- Using Nested and Repeated Fields Together 3m
- Using UNNEST to Query Repeated Fields 3m
- Aggregations 4m
- Subqueries 3m
- Windowing Operations 4m
- Performing Window Operations Using "Partition By" and "Order By" 4m
- Windowing Operations Using a Window Range 3m