Convolutional Neural Network in Keras
Let's learn how to build CNNs using the Keras library for solving problems with image recognition, object detection, and other computer vision applications.
Nov 12, 2019 • 8 Minute Read
Introduction
Convolutional neural networks (CNNs) are similar to neural networks to the extent that both are made up of neurons, which need to have their weights and biases optimized. The main difference between the two is that CNNs make the explicit assumption that the inputs are images, which allows us to incorporate certain properties into the architecture. These properties make the forward propagation step much more efficient and reduce the number of parameters needed in the network. This makes CNNs the best choice for solving problems related to image recognition, object detection, and other computer vision applications.
In this guide, you will learn how to build CNNs using the keras library. Let's start by loading the required libraries and packages.
import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import to_categorical
import matplotlib.pyplot as plt
from keras.datasets import mnist
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.utils import np_utils
Data
We will use the popular MNIST dataset in this guide. Each image in the dataset has dimensions of 28x28 pixels and contains a centered, grayscale digit. The model will take the image as input, and it will output one of the ten possible digits (0 through 9). There are 70,000 images in the data, of which 60,000 will be used for training the model and the remaining 10,000 for validating the model.
The first line of code below loads the MNIST dataset and creates the training and test arrays. The second line of code checks the shape of the second image in the training set. The result is a 28x28 pixel shape, which was expected.
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train[1].shape
plt.imshow(X_train[1])
plt.show()
Output:
(28, 28)
Data Preparation
CNNs identify images using pixels that are often related. However, before training the algorithm, we need to prepare the data. The first step is to reshape the inputs — X_train and X_test — as done in the first two lines of code below. The reshape function performs this task, taking in three arguments. The first argument is the number of images, shown as X_train.shape[0]. The second argument is the shape of each image (28x28), while the third argument is 1 because the images are greyscale.
The next step is normalization of the inputs to make it easier for the network to train. The third and fourth lines of code normalize the image pixel values from [0, 255] to [0,1].
Finally, we perform the one-hot-encoding of the target variable. This is done in the fifth and sixth lines of code below. The last two lines of code print the training, test shape and number of classes in the target variable.
# Lines 1 and 2
X_train = X_train.reshape((X_train.shape[0], 28, 28, 1)).astype('float32')
X_test = X_test.reshape((X_test.shape[0], 28, 28, 1)).astype('float32')
# Lines 3 and 4
X_train = X_train / 255
X_test = X_test / 255
# Lines 5 and 6
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]
print(X_train.shape); print(X_test.shape); print(num_classes)
Output:
(60000, 28, 28, 1)
(10000, 28, 28, 1)
10
Building the model
We will create a function to train the CNN model, which is defined as cnn_model below. The first line of code below calls for the Sequential constructor because the model type we are building is sequential in nature. From the second line of code onwards, we start using the add() function to add layers to the model.
The first layer is a Conv2D layer that will deal with the input images, represented as two-dimensional matrices. There are 32 nodes in this layer, which has a kernel size of 5, and the activation function is relu, or Rectified Linear Activation. ReLu is the most widely used activation function in deep neural networks because of its advantages in being nonlinear as well as having the ability to not activate all the neurons at the same time. In simple terms, this means that at a time, only a few neurons are activated, making the network sparse and very efficient.
The next step is to add a pooling layer, MaxPooling2D, followed by a regularization layer called Dropout. Between the dropout and the dense layers, there is the Flatten layer, which converts the 2D matrix data to a vector. This in turn allows the output to be processed by standard, fully connected layers.
The next step is to add the fully connected dense layer with 128 neurons and the rectifier activation function. Next, we add the output layer, which has 10 neurons for the 10 classes and a softmax activation function. This activation function generates probability-like predictions for each class.
The final step is to compile the model, which takes three parameters: optimizer, loss, and metrics. The optimizer controls the learning rate, which will be the adam optimizer in our case. The main advantage of the adam optimizer is that we don't need to specify the learning rate as is the case with gradient descent, thereby saving us the task of optimizing the learning rate for our model. We will use the categorical_crossentropy loss function, which is the common choice for classification problems. In simple terms, the lower the score, the better the model. The evaluation metric we will use to validate the model performance on the test data is the accuracy metric. The higher the accuracy score, the better the model performance.
The function below creates and compiles the CNN model as discussed above.
def cnn_model():
# create model
model = Sequential()
model.add(Conv2D(32, (5, 5), input_shape=(28, 28, 1), activation='relu'))
model.add(MaxPooling2D())
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
model = cnn_model()
Fitting and Evaluating the model
The first line of code below fits the model on the training data. We also provide the argument, epochs, which represents the number of training iterations. We have considered 5 epochs and the batch size of 150. The second line uses the model.evaluate() function to evaluate the model on the test data, while the third line prints the error and the accuracy score.
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=5, batch_size=150)
scores = model.evaluate(X_test, y_test, verbose=0)
print("CNN Error: %.2f%%" % (100-scores[1]*100))
Output:
Train on 60000 samples, validate on 10000 samples
Epoch 1/5
60000/60000 [==============================] - 53s 891us/step - loss: 0.2253 - acc: 0.9347 - val_loss: 0.0688 - val_acc: 0.9798
Epoch 2/5
60000/60000 [==============================] - 55s 909us/step - loss: 0.0687 - acc: 0.9790 - val_loss: 0.0459 - val_acc: 0.9846
Epoch 3/5
60000/60000 [==============================] - 56s 931us/step - loss: 0.0491 - acc: 0.9852 - val_loss: 0.0417 - val_acc: 0.9860
Epoch 4/5
60000/60000 [==============================] - 56s 927us/step - loss: 0.0401 - acc: 0.9876 - val_loss: 0.0412 - val_acc: 0.9867
Epoch 5/5
60000/60000 [==============================] - 58s 961us/step - loss: 0.0336 - acc: 0.9896 - val_loss: 0.0386 - val_acc: 0.9873
CNN Error: 1.27%
The above output shows that with only five epochs, we have achieved accuracy of 98.73 percent on our validation data set, which is very good performance.
Conclusion
In this guide, you have learned how to build a simple convolutional neural network using the high-performing deep learning library keras. You also learned about the different parameters that can be tuned depending on the problem statement and the data.
To learn more about building deep learning algorithms using the keras library, please refer to the following guides: