Vectors and Factors in R
Apr 15, 2020 • 7 Minute Read
Introduction
In this guide, we're going to talk about vectors and factors. In short, a vector is a list of atomic values, and a factor is a list of vectors. These two features allow us to understand the most basic datastructure elements in R and start a journey of statistical analysis. First we'll clarify each concept, then we'll look at a demonstration of each of them.
Vectors
These are the most basic data objects in R. You can distinguish a total of six atomic types and use them in the most efficient way according to your current situation.
Atomic types:
- Character
- Logical
- Integer
- Double
- Complex
- Raw
Let's create a small script to demonstrate each of these.
print("welcome")
print(3.14)
print(100L)
print(FALSE)
print(10+3i)
print(charToRaw('atomic raw'))
Executing them will result in the following output.
1] "welcome"
[1] 3.14
[1] 100
[1] FALSE
[1] 10+3i
[1] 61 74 6f 6d 69 63 20 72 61 77
The first line represents an atomic character vector, which may be familiar to you from other programming languages as string or character sequence. The second is the atomic double type, and the third is the atomic integer type. The fourth is the atomic boolean type, which can be either TRUE or FALSE. The last uses the charToRaw() function to convert our atomic character type to an atomic raw type. The output is actually the byte representation of the character sequence.
Integer and double atomic vectors allow you to create a sequence, which can be done the following way.
Suppose you need a sequence of double values for a task. If you are fine with increments of 1, you can do it the following way.
v <- 0.3:10.3
print(v)
The output should look like this.
0.3 1.3 2.3 3.3 4.3 5.3 6.3 7.3 8.3 9.3 10.3
If you need to change the increments to a custom value, the seq() function is there to help you.
v <- seq(0,10,by = 0.5)
print(v)
The output shows an increment of 0.5 in this case.
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0
Vectors behave very similarly to arrays. You have the option to access subset or the vector or grab elements by their index. Keep in mind that indexing starts from 1! Suppose you have an atomic vector of characters that represent IT equipment, and you need to grab the first two. You can do that the following way.
t <- c("Server","Switch","Router","Firewall","Monitor")
u <- t[c(1,2)]
print(u)
The output should look like the following.
1] "Server" "Switch"
You have the option to access elements with negative indexing as well. This concept works as in other programming languages. For example, getting the element with the -2 index means you accessed the one before the last element.
If you have vectors of the same length, you have the option to manipulate them with the add, subtract, multiply, and divide operators. This can be handy when simulating or demonstrating matrix operations.
Suppose you have two vectors with three values and the type is double.
v2 <- c(1.1,2.2,3.3)
v1 <- c(4.4,5.5,6.6)
Perform the following operations in order.
v1 + v2
v1 - v2
v1 * v2
v1 / v2
You should get the following result.
1] 5.5 7.7 9.9
[1] 3.3 3.3 3.3
[1] 4.84 12.10 21.78
[1] 4.0 2.5 2.0
There is a concept called vector recycling that comes into play if you are to perform an arithmetic operation on two vectors with different lengths. The elements of the shorter vector are recycled in order for the operation to complete and yield results. The only thing to keep in mind is that it only works if the longer vector is a multiple of the shorter vector, otherwise it will fail.
For example:
v1 <- c(1,2,3,4,5,6)
v2 <- c(7,8)
v1 * v2
Output:
1] 7 16 21 32 35 48
The content elements of v2 will be considered as 7,8,7,8,7,8.
Last but not least, when you are working with vectors, you should remember the sort() function. It takes an atomic vector and sorts the elements in either decreasing order or increasing order as per your function call.
# sort increasing order
v1 <- sort(c(4,2,3,1,9,8,6))
# sort decreasing order
v1 <- sort(c(4,2,3,1,9,8,6), decreasing = TRUE)
The decreasing argument of the sort function is FALSE by default.
Factors
Factors enjoy widespread popularity in statistical modeling and analysis. In concept, factors are implemented in R as variables that can take on a limited number of different values. They are also referred to as categorical variables. In realization, factors are stored as a vector of integer values with a corresponding set of character values that are used to display a factor. In order to create a factor, the factor() function needs to be used. When you create a factor, the only input argument you need to specify is a vector of values from any atomic type, and the factor function will return a vector of factor values. This relates to the concept of levels, where the level of a factor is basically the number of distinct elements.
Let's take an example vector that holds atomic characters and converts them to factors. The vector holds different types of drinks.
drinks <- factor(c("beer", "wine", "rum", "whiskey","cocktail","whiskey","rum"))
print(drink)
The output should look like this.
1] beer wine rum whiskey cocktail whiskey rum
The first thing you note is that the elements of the factor created from the atomic character vector are stored in order. To get the subset of unique elements, the levels function can be used.
levels(drinks)
This returns the following result.
1] "beer" "cocktail" "rum" "whiskey" "wine"
Note the double-quotes around the items.
You are able to access elements of a factor by their indexes, which start from 1!
In order to access the third element, you would use this code.
drinks[3]
You can also access subsections of a factor. Suppose you need the first two elements.
drinks[c(1,3)]
You are also able to modify elements of a factor, but be aware that you cannot modify elements outside their levels.
For example, this will work.
drinks[1] <- "wine"
This will fail.
drinks[1] <- "Coca Cola"
In order to overcome this problem, a new level needs to be introduced.
levels(drinks) <- c(levels(drinks), "Coca Cola")
drinks[1] <- "Coca Cola"
The output should be as follows.
1] Coca Cola wine rum whiskey cocktail whiskey rum
Conclusion
In this guide, we built up the knowledge to effectively use vectors and factors. We looked at the difference between these concepts and learned how they build upon each other to facilitate statistical analysis. I hope this guide has been informative to you and I would like to thank you for reading it!