Getting Started with Delta Lake on Databricks
This course will teach you how you can create, ingest data into, and work with Delta Lakes, an open-source storage layer that brings reliability to data stored in data lakes. Delta Lakes offer ACID transactions, unified batch and stream processing.
What you'll learn
The Databricks Data Lakehouse architecture is an innovative paradigm that combines the flexibility and low-cost storage offered by data lakes with the features and capabilities of a data warehouse. The lakehouse architecture achieves this by using metadata, indexing, and caching layer on top of data lake storage. This open-source storage layer is Delta Lake. This Delta Lake storage layer lies at the heart of Databricks’ lakehouse architecture.
In this course, Getting Started with Delta Lake on Databricks you will learn how exactly Delta Lakes supports transactions on cloud storage. First, you will learn the basic elements of Delta Lake namely Delta files, Delta tables, DeltaLog, and Delta optimizations.
Next, you will discover how you can get better performance from queries that you run on Delta tables using different optimizations. Here you will explore Delta caching, data skipping, and file layout optimizations such as partitioning, bin-packing, and z-order clustering.
Finally, you will explore how you can ingest data from external sources into Delta tables using batch and streaming ingestion. You will use the COPY INTO command for batch ingestion and the Databricks Auto Loader for stream ingestion.
When you are finished with this course, you will have the skills and ability to create, and ingest data into Delta Lakes and run optimal queries to extract insights.
Table of contents
- Prerequisites and Course Outline 2m
- Quick Overview of Databricks 4m
- The Databricks Data Lakehouse Architecture 6m
- Delta Lakes 4m
- Delta Tables 4m
- Delta Tables and Transactions 7m
- Demo: Launching the Databricks Workspace and Creating the Apache Spark Cluster 5m
- Demo: Enabling DBFS and Uploading Data 2m
- Demo: Creating a Delta Table Using the UI 5m
- Demo: Reading from and Writing to Delta Tables Using Apache Spark 4m
- Demo: Exploring the Structure of Delta Tables 5m
- Demo: Transactions and Commits in Delta Tables 7m
- Demo: Time Travel in Delta Tables 4m
- Demo: Cleaning up Old Versions Using VACUUM 4m
- Delta Lakes and Delta Engine 3m
- Delta Optimizations: Caching and Data Skipping 5m
- Demo: Enabling the Delta Cache 4m
- Demo: Caching Results and Accessing Cached Results 5m
- Demo: Retrieving Subsets of Cached Data 5m
- Demo: Disabling the Delta Cache 2m
- Delta Optimizations: File Layout Optimizations 7m
- Demo: Running Queries on the Original Non-optimized Table 4m
- Demo: Partitioning Delta Tables 6m
- Demo: Compaction or Bin-packing 4m
- Demo: Z-ordering 5m
- COPY INTO vs. Auto Loader 4m
- Auto Loader 4m
- Demo: Creating the Delta Table and Uploading Files to DBFS 3m
- Demo: Batch Loading Data Using COPY INTO 3m
- Demo: Performing Batch Loading Using Scheduled Jobs 4m
- Demo: Creating an AWS User and S3 Bucket 5m
- Demo: Using Auto Loader to Ingest Data from a Streaming Source 5m
- Demo: Loading Streaming Data in to a Delta Table 5m
- Summary and Further Study 1m