Featured resource
pluralsight tech forecast
2025 Tech Forecast

Which technologies will dominate in 2025? And what skills do you need to keep up?

Check it out
Hamburger Icon
  • Course
    • Libraries: If you want this course, consider one of these libraries.
    • Data

Performance Optimization in Apache Spark

Optimize Apache Spark workflows with advanced techniques. Learn partitioning, caching, join strategies, and adaptive query execution (AQE) to handle large datasets efficiently and improve query performance for real-world big data scenarios.

Pinal Dave - Pluralsight course - Performance Optimization in Apache Spark
by Pinal Dave

What you'll learn

Efficient performance optimization is critical for scaling Apache Spark workflows effectively.

In this course, Performance Optimization in Apache Spark, you’ll gain the ability to optimize Spark applications for handling large-scale data processing challenges.

First, you’ll explore partitioning strategies to distribute workloads efficiently and reduce data shuffling while learning techniques like wide and narrow transformations.

Next, you’ll discover how caching and persistence can improve iterative processing, along with effective join strategies such as broadcast joins and bucketing to enhance performance in large datasets.

Finally, you’ll learn to leverage adaptive query execution (AQE) features, including dynamic partition coalescing, dynamic join selection, and handling data skew to optimize complex queries seamlessly.

When you’re finished with this course, you’ll have the skills and knowledge of Apache Spark needed to create efficient, scalable workflows for real-world big data challenges.

Table of contents

About the author

Pinal Dave - Pluralsight course - Performance Optimization in Apache Spark
Pinal Dave

Pinal Dave is a Pluralsight Developer Evangelist.

More Courses by Pinal