Implement a Data Ingestion Solution Using AWS Glue
In this lab, you'll practice ingesting semi-structured JSON sample data into a normalized AWS Glue Data Catalog from a source S3 data store. When you're finished, you'll have configured AWS Glue to continuously crawl S3 for new data every 12 hours.
Terms and conditions apply.
Lab info
Lab author
Challenge
Obtain Source Data Files
Download the JSON source data files from the official AWS Samples GitHub repository that will be imported into S3 and ingested by AWS Glue.
Challenge
Create an S3 Bucket
Provision an S3 bucket that will be used by AWS Glue as the primary data store for its Data Catalog.
Challenge
Upload Source Data Files to S3
Load the JSON source data files into the S3 data store.
Challenge
Create the Crawler
Provision a crawler in AWS Glue to populate the AWS Glue Data Catalog every 12 hours with tables from the source S3 data store.
Challenge
Manually Run the Crawler
Execute the crawler manually to verify it performs as expected, and populate the AWS Glue Data Catalog with tables from the source S3 data store.
Provided environment for hands-on practice
We will provide the credentials and environment necessary for you to practice right within your browser.
Guided walkthrough
Follow along with the author’s guided walkthrough and build something new in your provided environment!
Did you know?
On average, you retain 75% more of your learning if you get time for practice.
Recommended prerequisites
- Amazon S3
- AWS Glue