• Labs icon Lab
  • Data
Labs

Summarize Data in R Using dplyr

This lab will teach you the basics of summarizing a dataset using the R programming language. The R language is a programming language specifically designed for data analysis, allowing you to perform analysis, calculations and create visualizations from datasets in seconds. This lab will teach you to accurately summarize and report on datasets in just a few lines of code.

Labs

Path Info

Level
Clock icon Beginner
Duration
Clock icon 44m
Published
Clock icon Apr 21, 2025

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.

Table of Contents

  1. Challenge

    Calculating Measures of Central Tendency

    In this lab you will learn how to summarize data in the R programming language using the library dplyr as well as some basic functions in R. In this first challenge you will learn how to import data and analyze it to get measures of central tendancy. By central tendancy this means understanding the central or typical value for a given dataset.

    Importing Data in R

    To begin you first need to know how to import data in R. Follow the steps below to import data and view the data. ###### Calculating the mean in R

    A common measure of central tendency is the mean or the average of a dataset. This represents what the average number in a dataset looks like and it's a common statistic to use in analysis so this is what you will calculate next. ###### Calculating the median in R

    Another common measure of central tendency is the median, which is meant to find the middle number of a dataset. The number that has 50% of values greater than it and the other 50% is below it, in some ways it is seen as a better representation of what a normal data point looks like then the mean. ###### Grouping Data in R

    When performing data analysis sometimes you prefer to group data rather than just getting overall metrics. For example, in this case where you have male and female students you may want to group data by gender or age rather than just getting the overall mean or median. You will learn how to do this in this next task. ###### Counting sample size

    The last thing you will cover in this challenge is counting sample size. When summarizing a dataset it's good to know exactly how large a dataset is because a larger sample size can be considered a better study compared to a smaller sample size. Here you will learn how to quickly perform this count in R.

  2. Challenge

    Calculating Measures of Dispersion

    Calculating measure of dispersion

    Measures of dispersion are meant to quantify how spread out or scattered data points are around a central value like the mean or median. This provides insight into the data's distribution.

    Calculating Standard Deviation

    One of the most common measures of dispersion is the standard deviation, which measures how far from the mean the average data point is, this is what you will calculate next. ###### Calculating Variance in a Dataset

    Next you will learn how to calculate variance, which measures the overall spread of data from the mean. ###### Calculating the Range in R

    The last measure of dispersion you will learn to calculate is the range, which looks at the difference between the largest and smallest numbers in the dataset. This gives you an overall idea of the total spread of the data.

  3. Challenge

    Calculating the Five Number Summary

    Calculating the five number summary

    In this challenge you will learn how to calculate the five number summary of dataset. The five number summary is a quick and efficient way of summarizing a dataset by looking at where the majority of data points fall. Here's what the five number summary consists of:

    • The minimum: This is the smallest value in your dataset.
    • The first quartile: This is denoted Q1 and 25% of your data falls below the first quartile number.
    • The median: This is the midway point of the data. 50% of all data falls below the median.
    • The third quartile: This number is denoted Q3 and 75% of our data falls below the third quartile.
    • The maximum: This is the largest value in our data set.

    You will calculate this in the next task. ###### Creating Nested Summaries

    So far you've done fairly simply one level summaries but in this next task you will learn how to create nested summaries that can summarized data by multiple groups and variables at the same time for more multi level analysis.

  4. Challenge

    Creating Data Visualizations for a Dataset

    In this final challenge you will learn how to create data visualizations that can represent data in a graphical way. Data visualizations allow people to understand certain aspects of your data at a glance. To begin you will learn how to create a bar graph in R. Next you will create a boxplot diagram. Lastly, you will create a scatterplot.

Shimon Brathwaite is a seven-year cybersecurity professional with extensive experience in Incident Response, Vulnerability Management, Identity and Access Management and Consulting.

What's a lab?

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Provided environment for hands-on practice

We will provide the credentials and environment necessary for you to practice right within your browser.

Guided walkthrough

Follow along with the author’s guided walkthrough and build something new in your provided environment!

Did you know?

On average, you retain 75% more of your learning if you get time for practice.