Working with Statistics Function in Excel
Sep 14, 2020 • 11 Minute Read
Introduction
In this guide, you will learn about various statistical functions that are available in Excel 2019. By the end of this guide, you'll be familiar with some important statistical functions with examples. Later, you can explore the rest of the functions using the Excel provided documentation.
The functions which you are going to learn in this guide are:
- AVERAGE
- AVERAGEIF
- MEDIAN
- PERMUT
- COUNT
- COUNTIF
- FREQUENCY
- STDEV, and VAR
- PERCENTILE.INC, QUARTILE.INC, and RANK.EQ
The AVERAGE Function
To start with, you have the most commonly used function, the AVERAGE. It gives you an average of n numbers by summing them up and dividing by the sum of the total numbers. Many beginners believe that they can only get the average of two numbers but this is not the case. Here's the syntax for the function:
=AVERAGE(n1, n2, n3 ... nn)
To elaborate this function with an example, consider the given tabular data which consists of three students' marks data. You need to find the average marks:
A | B | C | D |
---|---|---|---|
Subject 1 | Subject 2 | Subject 3 | Result |
26 | 38 | 78 | ? |
65 | 78 | 11 | ? |
12 | 88 | 50 | ? |
To find the average marks for each student, you need to deploy the given formula in D1 and then stretch it all the way down for rest of the two, =AVERAGE(A1, B1, C1). This results in the following answer:
A | B | C | D |
---|---|---|---|
Subject 1 | Subject 2 | Subject 3 | Result |
26 | 38 | 78 | 47.33 |
65 | 78 | 11 | 51.33 |
12 | 88 | 50 | 50 |
The AVERAGEIF Function
Now, in the above example, consider that you only need to average those subject marks where the student has scored more than 50. Such a result can't be calculated using the AVERAGE function, therefore, you need a new function, the AVERAGEIF function. The syntax is given below:
=AVERAGEIF(cell_range, condition, [opt_avg_range])
Form a tabular data for a student whose marks, in four out of five subjects, are known. You need to find the average marks, considering only those subjects where she has scored more than 50.
A | B |
---|---|
Subjects | Marks |
Subject 1 | 26 |
Subject 2 | 65 |
Subject 3 | 12 |
Subject 4 | |
Subject 5 | 78 |
Now, in any new cell, you can write the given formula to arrive at the result:
=AVERAGEIF(B1:B5, ">50")
This gives you the result as 71.5 by avoiding the marks for the subjects 1, 3, and 4.
The MEDIAN Function
At the back-end, the AVERAGE function uses the mean concept in calculation. This has a disadvantage if their is an anomaly in the data. For example, consider a case where three doctors report their number of patients analyzed in five consecutive days, as shown:
A | B | C | D | E |
---|---|---|---|---|
Day 1 | Day 2 | Day 3 | Day 4 | Day 5 |
8 | 12 | 16 | 10 | 87 |
12 | 15 | 10 | 11 | 9 |
8 | 4 | 1 | 0 | 2 |
If you try to average out the number of patients then the anomaly (87) for the first doctor can bring skewness in the result. Hence, in such scenarios, you should prefer to use median; the syntax in Excel is shown below:
=MEDIAN(n1, n2, n3 ... nn)
To apply the median on the table, you can proceed as =MEDIAN(A1, B1, C1, D1, E1) which results in the answer as 12. If you compare this answer with the =AVERAGE(A1, B1, C1, D1, E1) answer, 26.6 = 27 makes more sense.
The PERMUT Function
Permutation is a widely used concept in the realm of statistics. You can perform the permutation on a given data using the given syntax in Excel:
=PERMUT(n, n_chosen)
To understand what permutation is and how you can implement it in Excel, consider a case of three athletes of which only two need to be selected for a marathon. Permutation helps in answering this question without counting all the possibilities by hand. If you represent this case in a tabular format, it will look like this:
A | B |
---|---|
Number of athletes | To be chosen |
3 | 2 |
You can implement the PERMUT function as shown:
=PERMUT(A1, B1)
This gives the answer 6, stating that there are six possibilities for choosing two athletes from a group of three.
The COUNTBLANK Function
The COUNTBLANK function returns "the number of blank cells in a given specified range of cells," according to Excel's documentation. This can be a very useful function when you need to know how many values are missing in data.
The syntax for the COUNTBLANK function is given below:
=COUNTBLANK(range_of_cells)
To implement it in Excel, consider the given tabular data:
A |
---|
Values |
9 |
5 |
8 |
7 |
So, if you implement =COUNTBLANK(A1:A7), it counts the number of blank values. The result of the formula is 3. Three represents the total of the one blank value between 9 and 5 along with two blank values between 5 and 8.
The COUNTIF Function
Assume you need to calculate the number of people whose marks are above a cut-off decided at value 88. To do this, you can use the COUNTIF function whose syntax is given below:
=COUNTIF(range_of_cells)
To illustrate the scenario, consider the given tabular data:
A |
---|
Marks |
50 |
90 |
23 |
65 |
98 |
55 |
On the given tabular data, implement the following formula:
=COUNTIF(A1:A7, ">88")
This results in the value 2 for the values 90 and 98. Note that the function has also ignored the blank cell.
The FREQUENCY Function
Consider a column with a large number of values ranging from 1-1000 and you need to count how many values fall in the bins 0-270, 271-350, 351-888, and 888-1000. If you have a small number of values available then this can be done manually. However, for a large set of numbers, it is suggested to use the FREQUENCY function. Its syntax is given below:
=FREQUENCY(value_range, bin_range)
Consider this tabular data to implement the function:
A | B |
---|---|
Values | Bins |
255 | 270 |
615 | 350 |
478 | 888 |
24 | 1000 |
11 | |
998 | |
548 | |
250 | |
341 | |
880 | |
1222 | |
650 | |
555 |
To implement the FREQUENCY function, you first need to learn the array function. The given FREQUENCY function doesn't result in only one item, rather it results in one plus the number of bin values. In our table, we have four values available in Bins, so the output of the FREQUENCY function will have five values which form an array. Therefore, first, you need to select five vertical cells and write the given formula. The formula should be written in the first cell:
=FREQUENCY(A1:A13, B1:B4)
Once you write the formula, press Ctrl + Shift + Enter. This will result in the following answer and will also write the above formula within the curly braces, {}, of all the cells .
A | B | C |
---|---|---|
Values | Bins | Frequency |
255 | 270 | 4 |
615 | 350 | 1 |
478 | 888 | 6 |
24 | 1000 | 1 |
11 | 1 | |
998 | ||
548 | ||
250 | ||
341 | ||
880 | ||
1222 | ||
650 | ||
555 |
You can refer to this short YouTube video to learn how to use the array function.
The STDEV and VAR Functions
So far, you have learned about the functions that either belong to the measure of central tendency or are commonly used. In this section, you will explore two statistical functions that belong to the dispersion class i.e., the variance and standard deviation of a sample collected from a population data.
Their respective syntax is presented below:
=STDEV(data_array)
=VAR(data_array)
Consider a sample of five values taken from a population data as shown:
A |
---|
50 |
98 |
12 |
233 |
1 |
To find the variance and standard deviation in this data, you can implement the functions as shown:
=VAR(A1:A5)
=STDEV(A1:A5)
This results in the values are Variance = 8872.7 and Standard Deviation = 94.19.
The PERCENTILE.INC, QUARTILE.INC and RANK.EQ Functions
The PERCENTILE.INC function returns "the K'th percentile of values in a provided range, where K lies in the range 0 - 1 (both inclusive)," according to Excel's documentation.
The QUARTILE.INC function returns "the specified quartile of a set range of provided numbers, based on percentile value 0 - 1 (both inclusive)," according to Excel's documentation.
Here's the list of available percentiles that you can attain through the QUARTILE.INC function:
Argument | Result |
---|---|
0 | Minimum value |
1 | First quartile (25th percentile) |
2 | Second quartile (50th percentile). Also, the median. |
3 | Third quartile (75th percentile) |
4 | Maximum quartile |
The RANK.EQ function returns the mode of a list of provided numbers. FYI, mode means the number with the most frequency. Also, if more than one value has the same rank, the top rank of that set is returned.
Here're the syntax of these functions:
=PERCENTILE.INC(data, percentile)
=QUARTILE.INC(data, quartile)
=RANK.EQ(number_whose_rank_is_to_be_calculated, reference, [order])
Consider the below tabular data to implement these functions:
A |
---|
5 |
5 |
8 |
9 |
7 |
1 |
11 |
4 |
2 |
5 |
To find the 80th percentile of the data, use the following:
=PERCENTILE.INC(A1:A10, 0.8)
This gives the value as 8.2.
To find the 50th percentile using the QUARTILE.INC function, pass the value 2 to the quartile argument:
=QUARTILE.INC(A1:A10, 2)
That gives the value as 5.
To find the rank of number 7, proceed with the following formula:
=RANK.EQ(A5, A1:A10)
That gives you the rank of number 7 as 4.
Conclusion
In this guide, you've learned various important statistical functions that are commonly used in Excel. So, go ahead and start exploring them in your own scenarios.