Implementing Marketing Analytics in R: Part 2
Feb 15, 2020 • 16 Minute Read
Introduction
Marketing and customer-related decisions are a top priority for every business. With the help of statistical modeling and analytics, it's possible to support decision makers and help them make strategic decisions based on data, not just instincts. The integration of statistical modeling with marketing strategy can also be referred to as marketing analytics.
In this series of two guides, we explore techniques for implementing marketing analytics in R.
In the previous guide, Part 1, we covered:
-
Customer Churn Prediction
-
RFM Analysis
In this guide, Part 2, we will cover:
-
Clustering
-
Sales Forecasting
-
Other Areas
Let's start by loading the required libraries.
library(plyr)
library(readr)
library(dplyr)
library(caret)
library(ggplot2)
library(repr)
library(caret)
Segmentation
Segmentation is the marketing strategy of dividing or segmenting customers to build and deploy effective marketing plans. The basis of segmenting can be demographic, geographic, or any other metric. The statistical technique of dividing the customers into homogeneous segments is called clustering. Let's start by loading the data.
Data
For this clustering exercise, we'll use a fictitious dataset of retail store customers containing 150 observations and 5 variables, as described below:
-
mean_HH_income: The mean household income of the customer.
-
ABV: Average bill value per purchase by a customer.
-
No_bills_month: Average number of bills in a month from the customer.
-
satisfaction_score: The customer's satisfaction score. The higher the satisfaction score, the more satisfied the customer is with the store.
-
family_size: Number of members in the customer's family.
-
Gender: Whether the customer is Female ('1') or male ('0').
Let's load the data and look at the structure.
df_seg = read_csv("segdata.csv")
glimpse(df_seg)
Output:
Observations: 150
Variables: 6
$ mean_HH_income <int> 88006, 88005, 88004, 88003, 88002, 88001, 88000, 64...
$ ABV <int> 355, 354, 353, 352, 351, 350, 349, 348, 347, 346, 3...
$ No_bills_month <int> 41, 21, 21, 53, 21, 44, 49, 22, 10, 58, 25, 52, 41,...
$ satisfaction_score <int> 1, 3, 4, 4, 2, 1, 3, 4, 4, 3, 5, 1, 5, 3, 3, 5, 4, ...
$ family_size <int> 4, 5, 3, 7, 6, 9, 6, 5, 8, 10, 3, 3, 3, 8, 7, 3, 3,...
$ Gender <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
The next step is to look at the summary of the numerical variables.
summary(df_seg)
Output:
mean_HH_income ABV No_bills_month satisfaction_score
Min. :45000 Min. : 26.00 Min. : 5.00 Min. :1.000
1st Qu.:52349 1st Qu.: 62.25 1st Qu.:16.00 1st Qu.:2.000
Median :64277 Median :286.50 Median :26.50 Median :3.000
Mean :59873 Mean :215.39 Mean :30.20 Mean :2.873
3rd Qu.:64314 3rd Qu.:323.75 3rd Qu.:44.75 3rd Qu.:4.000
Max. :88006 Max. :355.00 Max. :60.00 Max. :5.000
family_size Gender
Min. : 1.000 Min. :0.0000
1st Qu.: 3.000 1st Qu.:0.0000
Median : 6.000 Median :0.0000
Mean : 5.513 Mean :0.3333
3rd Qu.: 8.000 3rd Qu.:1.0000
Max. :10.000 Max. :1.0000
From the output above, we can infer that the variables have different scales that will adversely affect the calculation of distances for clustering. To prevent this, we'll normalize the data using the code below. We can normalize the variables in a data frame by using the preProcess function in the caret package.
The first line of code below pre-processes the data, while the second line performs the normalization. If we look at the summary of the segNorm variable, we'll see that all the variables now have a mean of zero.
p1_object = preProcess(df_seg)
segNorm = predict(p1_object, df_seg)
summary(segNorm)
Output:
mean_HH_income ABV No_bills_month satisfaction_score
Min. :-1.6354 Min. :-1.5124 Min. :-1.4713 Min. :-1.31892
1st Qu.:-0.8274 1st Qu.:-1.2229 1st Qu.:-0.8291 1st Qu.:-0.61487
Median : 0.4842 Median : 0.5679 Median :-0.2160 Median : 0.08918
Mean : 0.0000 Mean : 0.0000 Mean : 0.0000 Mean : 0.00000
3rd Qu.: 0.4883 3rd Qu.: 0.8654 3rd Qu.: 0.8495 3rd Qu.: 0.79323
Max. : 3.0935 Max. : 1.1149 Max. : 1.7399 Max. : 1.49728
family_size Gender
Min. :-1.5167 Min. :-0.7047
1st Qu.:-0.8446 1st Qu.:-0.7047
Median : 0.1635 Median :-0.7047
Mean : 0.0000 Mean : 0.0000
3rd Qu.: 0.8356 3rd Qu.: 1.4095
Max. : 1.5077 Max. : 1.4095
After normalizing the data, the next step is to run the clustering algorithm. We'll use the popular kmeans algorithm because it is powerful and interpretable. The number of clusters we will create is five, as given by the argument centers=5 in the first line of code. The second line displays the number of observations in each cluster.
kmeans_Clust = kmeans(segNorm, centers=5, iter.max=1000)
table(kmeans_Clust$cluster)
Output:
1 2 3 4 5
31 44 13 21 41
From the output above, we can infer that the smallest cluster, 3, has 13 observations, while the largest cluster, 2, has 44 observations. If we want to understand the clusters, we can do that with the command below.
The first line of code below adds a new variable in the original un-normalized data, df-seg. The second line uses the lapply() function to create and print the summary of each cluster.
df_seg$kmeansseg = kmeans_Clust$cluster
lapply(split(df_seg, df_seg$kmeansseg), colMeans)
Output:
$`1`
mean_HH_income ABV No_bills_month satisfaction_score
51832.129032 46.645161 22.709677 2.258065
family_size Gender kmeansseg
4.354839 1.000000 1.000000
$`2`
mean_HH_income ABV No_bills_month satisfaction_score
67507.954545 311.727273 36.818182 1.909091
family_size Gender kmeansseg
5.204545 0.000000 2.000000
$`3`
mean_HH_income ABV No_bills_month satisfaction_score
51437.230769 129.076923 28.692308 2.153846
family_size Gender kmeansseg
7.384615 0.000000 3.000000
$`4`
mean_HH_income ABV No_bills_month satisfaction_score
5.182395e+04 1.424762e+02 4.519048e+01 3.619048e+00
family_size Gender kmeansseg
6.142857e+00 9.047619e-01 4.000000e+00
$`5`
mean_HH_income ABV No_bills_month satisfaction_score
64556.170732 304.292683 21.560976 4.219512
family_size Gender kmeansseg
5.804878 0.000000 5.000000
Sales Forecasting
Forecasting needs no introduction, and it is an important part of marketing teams, often used for forecasting sales. Let's start by loading the data.
Data
For this forecasting exercise, we'll use fictitious monthly sales data containing 81 observations, and the task will be to create a forecasting model to predict sales over the next 12 months.
library(forecast)
library(fpp2)
library(TTR)
library(dplyr)
dat <- read_csv("salesforecasting.csv")
glimpse(dat)
Output:
Observations: 81
Variables: 3
$ Date <chr> "01-Mar-13", "01-Apr-13", "01-May-13", "01-Jun-13", "01-Jul-13",...
$ Sales <int> 192, 198, 197, 213, 202, 186, 215, 213, 213, 220, 210, 219, 233,...
$ Class <chr> "Train", "Train", "Train", "Train", "Train", "Train", "Train", "...
The next step is to create training and test sets for model validation. The first couple of lines of code below perform this task by using the Class variable, which already has labels Train and Test. The third line of code prints the number of rows in the training and the test data. Note that the test data set has 12 rows because we will be building the model to forecast for 12 months ahead.
dat_train = subset(dat, Class == 'Train')
dat_test = subset(dat, Class == 'Test')
nrow(dat_train); nrow(dat_test)
Output:
1] 69
[1] 12
Preparing the Time Series Object
To run forecasting models in 'R', we need to convert the data into a time series object, which is done in the first line of code below. The start and end argument specifies the time of the first and the last observations, respectively. The argument frequency specifies the number of observations per unit of time.
We will also create a utility function for calculating Mean Absolute Percentage Error (or MAPE), which will be used to evaluate the performance of the forecasting models. The lower the MAPE value, the better the forecasting model. This is done in the second to fourth lines of code.
dat_ts <- ts(dat_train[, 2], start = c(2013, 3), end = c(2018, 11), frequency = 12)
#lines 2 to 4
mape <- function(actual,pred){
mape <- mean(abs((actual - pred)/actual))*100
return (mape)
}
With the data and the mape function prepared, we move to the forecasting technique to be used in this case.
Holt's Trend Method
We'll use Holt's Trend forecasting method, a traditional statistical algorithm, which considers the trend component while generating forecasts. This method involves two smoothing equations, one for the level and one for the trend component.
The first line of code below creates the Holt's winter model and stores it in an object 'holt_model'. The second line prints the summary and the forecasts for the next 12 months.
holt_model <- holt(dat_ts, h = 12)
summary(holt_model)
Output:
Forecast method: Holt's method
Model Information:
Holt's method
Call:
holt(y = dat_ts, h = 12)
Smoothing parameters:
alpha = 0.8342
beta = 1e-04
Initial states:
l = 193.7099
b = 3.4287
sigma: 16.2391
AIC AICc BIC
682.6969 683.6492 693.8674
Error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set -0.8870433 15.76137 11.8633 -0.4614957 4.084311 0.2366019 0.004106845
Forecasts:
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
Dec 2018 382.4585 361.6473 403.2698 350.6305 414.2866
Jan 2019 385.8799 358.7765 412.9833 344.4288 427.3309
Feb 2019 389.3012 357.1116 421.4908 340.0715 438.5309
Mar 2019 392.7225 356.1461 429.2989 336.7838 448.6612
Apr 2019 396.1438 355.6521 436.6355 334.2171 458.0706
May 2019 399.5651 355.5036 443.6267 332.1789 466.9514
Jun 2019 402.9865 355.6225 450.3504 330.5496 475.4233
Jul 2019 406.4078 355.9563 456.8593 329.2489 483.5667
Aug 2019 409.8291 356.4676 463.1906 328.2197 491.4385
Sep 2019 413.2504 357.1288 469.3721 327.4198 499.0811
Oct 2019 416.6717 357.9187 475.4248 326.8168 506.5267
Nov 2019 420.0931 358.8209 481.3652 326.3854 513.8008
The output above shows that the MAPE for the training data is 4.1 percent. Let's now evaluate the model performance on the test data, which is done in the lines of code below. The MAPE error on the test data comes out to be 2.7 percent, which is an improvement over the training data result.
df_holt = as.data.frame(holt_model)
dat_test$holt = df_holt$`Point Forecast`
mape(dat_test$Sales, dat_test$holt)
Output:
1] 2.735082
Overall Forecasting for Next 12 Months
The accuracy number we got is respectable, so we can create the forecast for the next 12 months using the line of codes below.
df_forecast <- ts(dat[, 2], start = c(2013, 3), end = c(2019, 11), frequency = 12)
arima_model_ahead <- auto.arima(df_forecast)
forecast_12 = forecast::forecast(arima_model_ahead, h=12)
forecast_12
Output:
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
Dec 2019 438.1473 419.0833 457.2113 408.9915 467.3031
Jan 2020 441.9584 417.3683 466.5485 404.3510 479.5658
Feb 2020 446.0598 416.9752 475.1444 401.5787 490.5409
Mar 2020 442.3232 409.3512 475.2952 391.8968 492.7496
Apr 2020 443.8119 407.3648 480.2591 388.0708 499.5530
May 2020 445.3007 405.6821 484.9193 384.7092 505.8921
Jun 2020 449.9827 407.4283 492.5370 384.9014 515.0640
Jul 2020 459.3094 414.0091 504.6097 390.0286 528.5902
Aug 2020 463.9914 416.1024 511.8804 390.7515 537.2313
Sep 2020 462.8675 412.5227 513.2122 385.8718 539.8631
Oct 2020 462.3241 409.6379 515.0103 381.7476 542.9007
Nov 2020 463.5226 408.5947 518.4505 379.5176 547.5275
Other Areas
Marketing analytics is a humongous area, and it is not possible to cover all the concepts and techniques in one guide, or for that matter even one book. We have covered the most widely used marketing analytics techniques above. Some other popular ones are mentioned below.
-
Customer Life Time Value (or CLV): In this technique, you can decide which customers are most valuable to the business by modeling their CLV.
-
Natural Language Processing (or NLP): NLP is a domain in itself and is increasingly being used for sentiment analysis, topic modeling, and text classification.
-
Dimensionality Reduction: This technique is often used for the survey of CRM data, which can be extensive.
-
Affinity Analysis: In this technique, you can decide which items have a higher likelihood of being bought together.
-
Marketing Campaign Optimization: In this technique, you can model the effectiveness of a marketing campaign and optimize your marketing budget.
Conclusion
In this series of guides, you learned about the most popular use cases of marketing analytics. You learned the concepts along with the implementation in R.
To learn more about data science with R, please refer to the following guides: