Course

Skills

Getting Started with Delta Lake on Databricks

by Janani Ravi

This course will teach you how you can create, ingest data into, and work with Delta Lakes, an open-source storage layer that brings reliability to data stored in data lakes. Delta Lakes offer ACID transactions, unified batch and stream processing.

Preview this course

Try for free

Get this course plus top-rated picks in tech skills and other popular topics.

$29.00

per month after 10 day trial

Your 10 day Standard free trial includes

Expert-led courses

Keep up with the pace of change with thousands of expert-led, in-depth courses.

For teams

Give up to 50 users access to our full library including this course free for 30 days

Course info

Rating

(32)

Level

Beginner

Updated

Sep 2, 2022

Duration

2h 29m

What you'll learn

The Databricks Data Lakehouse architecture is an innovative paradigm that combines the flexibility and low-cost storage offered by data lakes with the features and capabilities of a data warehouse. The lakehouse architecture achieves this by using metadata, indexing, and caching layer on top of data lake storage. This open-source storage layer is Delta Lake. This Delta Lake storage layer lies at the heart of Databricks’ lakehouse architecture.

In this course, Getting Started with Delta Lake on Databricks you will learn how exactly Delta Lakes supports transactions on cloud storage. First, you will learn the basic elements of Delta Lake namely Delta files, Delta tables, DeltaLog, and Delta optimizations.

Next, you will discover how you can get better performance from queries that you run on Delta tables using different optimizations. Here you will explore Delta caching, data skipping, and file layout optimizations such as partitioning, bin-packing, and z-order clustering.

Finally, you will explore how you can ingest data from external sources into Delta tables using batch and streaming ingestion. You will use the COPY INTO command for batch ingestion and the Databricks Auto Loader for stream ingestion.

When you are finished with this course, you will have the skills and ability to create, and ingest data into Delta Lakes and run optimal queries to extract insights.

Course Overview

1min

Course Overview 2m

Exploring Delta Lake on Databricks

62mins

Prerequisites and Course Outline 2m
Quick Overview of Databricks 4m
The Databricks Data Lakehouse Architecture 6m
Delta Lakes 4m
Delta Tables 4m
Delta Tables and Transactions 7m
Demo: Launching the Databricks Workspace and Creating the Apache Spark Cluster 5m
Demo: Enabling DBFS and Uploading Data 2m
Demo: Creating a Delta Table Using the UI 5m
Demo: Reading from and Writing to Delta Tables Using Apache Spark 4m
Demo: Exploring the Structure of Delta Tables 5m
Demo: Transactions and Commits in Delta Tables 7m
Demo: Time Travel in Delta Tables 4m
Demo: Cleaning up Old Versions Using VACUUM 4m

Optimizing Queries on Delta Tables

50mins

Delta Lakes and Delta Engine 3m
Delta Optimizations: Caching and Data Skipping 5m
Demo: Enabling the Delta Cache 4m
Demo: Caching Results and Accessing Cached Results 5m
Demo: Retrieving Subsets of Cached Data 5m
Demo: Disabling the Delta Cache 2m
Delta Optimizations: File Layout Optimizations 7m
Demo: Running Queries on the Original Non-optimized Table 4m
Demo: Partitioning Delta Tables 6m
Demo: Compaction or Bin-packing 4m
Demo: Z-ordering 5m

Ingesting Batch and Streaming Data into Delta Tables

35mins

COPY INTO vs. Auto Loader 4m
Auto Loader 4m
Demo: Creating the Delta Table and Uploading Files to DBFS 3m
Demo: Batch Loading Data Using COPY INTO 3m
Demo: Performing Batch Loading Using Scheduled Jobs 4m
Demo: Creating an AWS User and S3 Bucket 5m
Demo: Using Auto Loader to Ingest Data from a Streaming Source 5m
Demo: Loading Streaming Data in to a Delta Table 5m
Summary and Further Study 1m

About the author

Janani Ravi

Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework. After spending years working in tech in the Bay Area, New York, and Singapore at companies such as Microsoft, Google, and Flipkart, Janani finally decided to combine her love for technology with her passion for teaching. She is now the co-founder of Loonycorn, a content studio focused on providing ... more

See more courses by Janani Ravi

Try for free

Get this course plus top-rated picks in tech skills and other popular topics.

$29.00

per month after 10 day trial

Your 10 day Standard free trial includes

Expert-led courses

Keep up with the pace of change with thousands of expert-led, in-depth courses.

For teams

Give up to 50 users access to our full library including this course free for 30 days

Course info

Rating

(32)

Level

Beginner

Updated

Sep 2, 2022

Duration

2h 29m

Ready to upskill? Get started

Contact Sales

Getting Started with Delta Lake on Databricks

What you'll learn

Table of contents

About the author

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Contact Sales

Getting Started with Delta Lake on Databricks

What you'll learn

Table of contents

About the author

Get access now

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Ready to skill up
your entire team?

Ready to skill up
your entire team?