Data Management Tools on Databricks
This course will teach you some of the fundamental techniques to store, manage, and process data using the Databricks platform.
What you'll learn
Data is at the heart of Databricks and managing it in an optimal manner is a crucial skill for any user on this platform.
In this course, Data Management Tools on Databricks, you’ll learn to load, configure, and access data using the UI, the dbutils library, and a Spark application.
First, you'll explore the Databricks File System (DBFS), how it is implemented as a layer above object storage, and how it can be accessed using the Databricks web UI and the Databricks API. You'll also look into the use of the dbutils library, from its application in file system operations to setting up widgets in a notebook.
Next, you'll delve into management of structured data in Databricks by creating and then using managed (Delta) tables and external tables, seeing the features available for each, how they are similar, and where they differ from each other.
Finally, you'll turn your attention towards consuming and analyzing data from a Spark application built using a notebook, and glimpse into the metrics and graphs that are available for tracking executions and resources within Databricks.
When you are finished with this course, you'll have gained the necessary knowledge and skills in data management and processing on Databricks to help you store and access data in a secure and efficient manner on this platform.
Table of contents
- Course Prerequisites and Outline 2m
- The Databricks File System 6m
- Demo: Setting up a Databricks Workspace 5m
- An Introduction to dbutils 2m
- Demo: Uploading a File to DBFS 5m
- Demo: Creating a Notebook 5m
- Demo: Exploring the dbutils Library 4m
- Demo: Performing File System Operations with dbutils 7m
- Demo: Using the Widgets API in dbutils 7m