Featured resource
pluralsight tech forecast
2025 Tech Forecast

Which technologies will dominate in 2025? And what skills do you need to keep up?

Check it out
Hamburger Icon

AWS Glue

Course Summary

The AWS Glue course is designed to provide an understanding of AWS Glue and its components, including insights into capabilities, targeted use cases, and development processes involved in using the service. Participants will gain knowledge of AWS Glue components including metadata management, ETL job development, streaming data processing, and advanced data integration techniques. Participants will explore the intricacies of AWS Glue, from storing and discovering metadata schemas to orchestrating ETL workflows and ensuring data quality. By the end, participants will have a high-level understanding of AWS Glue's capabilities, targeted use cases, and the development processes required to utilize this powerful service effectively.

Prerequisites

  • Basic understanding of AWS
  • Understanding of ETL (extract, transform, load) processes and data management principles
  • Experience with SQL
  • Basic experience with shell scripting and command-line tools
  • Basic to knowledge of Python programming
  • Understanding of data streaming
  • Awareness of data quality concepts
     
Purpose
Gain an understanding of AWS Glue's capabilities and the development processes required to utilize the service.
Audience
Engineers looking to visually create, run and monitor ETL workflows.
Role
Data Engineers | Software Developers | Data Analysts
Skill level
Intermediate
Style
Lectures | Hands-on Activities | Labs | Use Cases
Duration
2 days
Related technologies
Python | PySpark | PyTorch | SQL

 

Course objectives
  • Store and discover metadata schemas using Glue Data Catalog databases and tables, and Glue Crawlers.
  • Utilize PySpark and Glue-flavored PySpark to develop Glue ETL code and incorporate 3rd party Python libraries.
  • Develop, package, and deploy regular Glue ETL jobs.
  • Create and deploy Glue streaming ETL jobs to process data from AWS Kinesis data streams.
  • Use Glue Studio to create customized ETL and streaming ETL jobs.
  • Monitor and troubleshoot AWS Glue ETL jobs effectively.
  • Orchestrate AWS Glue jobs using Glue Workflow and StepFunction.
  • Build CI/CD pipelines with AWS CodePipeline for AWS Glue.
  • Implement incremental data loading using AWS Glue job bookmarks.
  • Ensure data quality using AWS Glue.

What you'll learn:

In this AWS Glue course, you'll learn:

Metadata Management with Glue Data Catalog

  • Introduction to Glue Data Catalog databases and tables
  • Discovering metadata schemas using Glue Crawlers
  • Practical use cases and best practices

Introduction to Apache Spark and PySpark for Glue

  • Overview of Apache Spark and its architecture
  • Using PySpark and Glue-flavored PySpark for ETL development
  • Integrating 3rd party Python libraries in Glue

Developing and Deploying Glue ETL Jobs

  • Writing and testing Glue ETL scripts
  • Packaging and deploying Glue ETL jobs
  • Best practices for efficient ETL job management

Streaming Data Processing with Glue

  • Introduction to Glue streaming ETL
  • Developing streaming ETL code in a notebook environment
  • Deploying Glue streaming jobs to process data from AWS Kinesis

ETL Job Creation with Glue Studio

  • Overview of Glue Studio and its features
  • Creating customized transformations with SQL and custom scripts
  • Developing ETL and streaming ETL jobs using Glue Studio

Monitoring and Troubleshooting Glue ETL Jobs

  • Monitoring Glue ETL job performance and logs
  • Troubleshooting common issues in Glue ETL jobs
  • Tools and techniques for effective job management

Orchestrating Glue Jobs with Workflows and StepFunctions

  • Introduction to Glue Workflows
  • Creating and managing workflows with Glue Workflows and StepFunctions
  • Managed Worfklows for Apache Airflow
  • Real-world examples and best practices

Building CI/CD Pipelines for AWS Glue

  • Overview of CI/CD principles and AWS CodePipeline
  • Setting up a CI/CD pipeline for Glue ETL jobs
  • Integrating automated testing and deployment

Incremental Data Loading with Glue Job Bookmarks

  • Introduction to Glue job bookmarks
  • Configuring and using job bookmarks for incremental data loads
  • Practical use cases and implementation strategies

Ensuring Data Quality with AWS Glue

  • Overview of data quality concepts
  • Implementing data quality checks within ETL pipelines
  • Using Glue Data Quality features for continuous monitoring and improvement

Dive in and learn more

When transforming your workforce, it’s important to have expert advice and tailored solutions. We can help. Tell us your unique needs and we'll explore ways to address them.

Let's chat

By filling out this form and clicking submit, you acknowledge our privacy policy.