Transform Data Using Apache Spark on Amazon EMR
In this lab you'll practice how to transform and massage data using Apache Spark on Amazon EMR Cluster, and get the transformed data as an output.
Terms and conditions apply.
Lab info
Lab author
Challenge
Configure a Subnet for EMR Cluster
You'll configure a subnet for EMR Cluster in the same Availability Zone as per the EC2 instance.
Challenge
Configure an Amazon EMR Cluster to Run Spark Jobs
You'll create an Amazon EMR Cluster to run spark jobs for data transformation/pre-processing.
Challenge
Run Spark Jobs for Data Transformation
Run Spark Jobs for Data Transformation and Data Pre-Processing in your EMR cluster.
Provided environment for hands-on practice
We will provide the credentials and environment necessary for you to practice right within your browser.
Guided walkthrough
Follow along with the author’s guided walkthrough and build something new in your provided environment!
Did you know?
On average, you retain 75% more of your learning if you get time for practice.
Recommended prerequisites
- AWS CLI
- AWS EC2
- AWS S3 Buckets
- Spark
- Git Commands