- Lab
- Data

Summarize Data in R Using dplyr
This lab will teach you the basics of summarizing a dataset using the R programming language. The R language is a programming language specifically designed for data analysis, allowing you to perform analysis, calculations and create visualizations from datasets in seconds. This lab will teach you to accurately summarize and report on datasets in just a few lines of code.

Path Info
Table of Contents
-
Challenge
Calculating Measures of Central Tendency
In this lab you will learn how to summarize data in the R programming language using the library
dplyr
as well as some basic functions in R. In this first challenge you will learn how to import data and analyze it to get measures of central tendancy. By central tendancy this means understanding the central or typical value for a given dataset.Importing Data in R
To begin you first need to know how to import data in R. Follow the steps below to import data and view the data. ###### Calculating the mean in R
A common measure of central tendency is the mean or the average of a dataset. This represents what the average number in a dataset looks like and it's a common statistic to use in analysis so this is what you will calculate next. ###### Calculating the median in R
Another common measure of central tendency is the median, which is meant to find the middle number of a dataset. The number that has 50% of values greater than it and the other 50% is below it, in some ways it is seen as a better representation of what a normal data point looks like then the mean. ###### Grouping Data in R
When performing data analysis sometimes you prefer to group data rather than just getting overall metrics. For example, in this case where you have male and female students you may want to group data by gender or age rather than just getting the overall mean or median. You will learn how to do this in this next task. ###### Counting sample size
The last thing you will cover in this challenge is counting sample size. When summarizing a dataset it's good to know exactly how large a dataset is because a larger sample size can be considered a better study compared to a smaller sample size. Here you will learn how to quickly perform this count in R.
-
Challenge
Calculating Measures of Dispersion
Calculating measure of dispersion
Measures of dispersion are meant to quantify how spread out or scattered data points are around a central value like the mean or median. This provides insight into the data's distribution.
Calculating Standard Deviation
One of the most common measures of dispersion is the standard deviation, which measures how far from the mean the average data point is, this is what you will calculate next. ###### Calculating Variance in a Dataset
Next you will learn how to calculate variance, which measures the overall spread of data from the mean. ###### Calculating the Range in R
The last measure of dispersion you will learn to calculate is the range, which looks at the difference between the largest and smallest numbers in the dataset. This gives you an overall idea of the total spread of the data.
-
Challenge
Calculating the Five Number Summary
Calculating the five number summary
In this challenge you will learn how to calculate the five number summary of dataset. The five number summary is a quick and efficient way of summarizing a dataset by looking at where the majority of data points fall. Here's what the five number summary consists of:
- The minimum: This is the smallest value in your dataset.
- The first quartile: This is denoted Q1 and 25% of your data falls below the first quartile number.
- The median: This is the midway point of the data. 50% of all data falls below the median.
- The third quartile: This number is denoted Q3 and 75% of our data falls below the third quartile.
- The maximum: This is the largest value in our data set.
You will calculate this in the next task. ###### Creating Nested Summaries
So far you've done fairly simply one level summaries but in this next task you will learn how to create nested summaries that can summarized data by multiple groups and variables at the same time for more multi level analysis.
-
Challenge
Creating Data Visualizations for a Dataset
In this final challenge you will learn how to create data visualizations that can represent data in a graphical way. Data visualizations allow people to understand certain aspects of your data at a glance. To begin you will learn how to create a bar graph in R. Next you will create a boxplot diagram. Lastly, you will create a scatterplot.
What's a lab?
Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.
Provided environment for hands-on practice
We will provide the credentials and environment necessary for you to practice right within your browser.
Guided walkthrough
Follow along with the author’s guided walkthrough and build something new in your provided environment!
Did you know?
On average, you retain 75% more of your learning if you get time for practice.