Cleaning Data with Pandas
Learn to clean and manipulate data using the Pandas library in Python. Cover common issues like missing values and irrelevant features, use correlation analysis, encode categorical features, and prepare data for machine learning models.
What you'll learn
In the real world, rarely is data organized into neat tables that can be fed directly into a machine learning model or used for data analysis. Data you find is often messy, missing many values, and generally tends to have multiple other issues that you need to solve before gaining any sort of meaningful inference from it.
In this course, Cleaning Data with Pandas, you will learn how to use the Pandas library in Python to clean and manipulate data.
First, you will understand what data cleaning is and why it is so important in the context of data analysis. Then, you will solve the most common issues plaguing datasets - missing values, irrelevant features, and duplicate values.
Next, you will see what correlation analysis is and how it helps in data cleaning.
Finally, you will see how to encode categorical features and prepare your dataset to be fed into machine learning models.
When you’re finished with this course, you will have the skills and knowledge you need to effectively clean and manipulate data using Pandas.
Table of contents
- Course and Module Introduction 3m
- What Is Data Cleaning and Why Is It Important? 3m
- The Data Cleaning Process 2m
- Demo: Introduction to the Problem and Dataset 3m
- Demo: Setting up Your Environment 3m
- Demo: Importing the Dataset and Basic Exploration 6m
- What Is Missing Data? 3m
- Demo: Dealing with Missing Data 7m
- Demo: Dealing with Illogical, Duplicate, and Datetime Values 6m
- Module Summary 1m