Simple play icon Course
Skills

Beginning Data Exploration and Analysis with Apache Spark

by Swetha Kolalapudi

80% of a data scientist's job is data preparation. This course is all about data preparation i.e. cleaning, transforming, summarizing data using Spark.

What you'll learn

Data preparation is a staple task for any data professional, whether you just want to explore data or develop sophisticated Machine Learning models. Spark is an engine that helps do this in a very intuitive way, using functional constructs that abstract the user from all the messiness of working with large datasets. In this course, Beginning Data Exploration and Analysis with Apache Spark, you'll go through exploratory data analysis and data munging with Spark, step-by-step. First, you'll explore RDDs and functional constructs that make processing in Spark extremely intuitive. Next, you'll discover how to transform and clean unstructured data. Finally, you'll learn how to summarize data along dimensions and how to model relationships to build co-occurrence networks. By the end of this course, you'll be able to use Spark to transform data in any way that you would like.

About the author

Swetha loves playing with data and crunching numbers to get cool insights. She is an alumnus of top schools like IIT Madras and IIM Ahmedabad. She was the first member of Flipkart’s elite Analytics team and was instrumental in scaling it to 100+ employees. Swetha has always had an entrepreneurial bent and a love for teaching. She now has the chance to do both as the co¬founder of Loonycorn, a content studio focused on providing high quality content for technical skill development. Loonycorn ... more

Ready to upskill? Get started