A Simple Deep Learning Model to Add Two Numbers
By using Data Tensors to get the data in proper shape and developing a neural network using Keras, we can see how traditional computational problems are handled in the Deep Learning world.
Apr 11, 2019 • 11 Minute Read
Introduction
Artificial Neural Networks, or Deep Learning Models, are one of the most powerful predictive tools that machine learning has to offer. They are ideally suited for perceptual problems, the category of problems that are easy for humans to make sense of, understand, and solve, as compared to traditional computer tasks. For example, the computer can easily compute multiplication of 3453454345 and 94834368345 in a matter of microseconds while distinguishing a cat from a dog present in a picture is a non-trivial problem which, until recently, was a big challenge for computing world.
With the advent of Deep Learning, there have been huge successes for these kinds of perceptual problems. In this guide, for the sake of simplicity and ease of understanding, we will try to change the simple arithmetic addition to that of a perceptual problem and then try to predict the values through this trained model.
In this guide, we are going to use Keras library which is made available as part of the Tensorflow library.
Data Tensors
Getting the data in proper shape is perhaps the most important aspect of any machine learning model and it holds true here as well. The below program (data_creation.py) creates the training and test sets for the Addition problem.
import numpy as np
train_data = np.array([[1.0,1.0]])
train_targets = np.array([2.0])
print(train_data)
for i in range(3,10000,2):
train_data= np.append(train_data,[[i,i]],axis=0)
train_targets= np.append(train_targets,[i+i])
test_data = np.array([[2.0,2.0]])
test_targets = np.array([4.0])
for i in range(4,8000,4):
test_data = np.append(test_data,[[i,i]],axis=0)
test_targets = np.append(test_targets,[i+i])
Let's analyze the above program:
import numpy as np
train_data = np.array([[1.0,1.0]])
train_targets = np.array([2.0])
In the above three lines, we are importing the Numpy library and creating train_data and train_target data sets. train_data is the array that will be used to hold the two numbers that are going to be added while train_targets is the vector that will hold the Addition value of the two. train_data is initialized to contain the value like 1.0 and 1.0 as two numbers. This is a very simple program so you will see the same number repeated (1.0) and this pattern is repeated in the entire train and test data set that is same number (i) is used to add itself.
for i in range(3,10000,2):
train_data= np.append(train_data,[[i,i]],axis=0)
train_targets= np.append(train_targets,[i+i])
The above lines append the train_data array and train_target vector by looping over the counter (i) that starts from 3 and goes up to 10000 with a step function of 2. This is what train_data looks like:
Output
[1.000e+00 1.000e+00]
[3.000e+00 3.000e+00]
[5.000e+00 5.000e+00]
...
[9.995e+03 9.995e+03]
[9.997e+03 9.997e+03]
[9.999e+03 9.999e+03]
train_targets:
Output
2.0000e+00 6.0000e+00 1.0000e+01 ... 1.9990e+04 1.9994e+04 1.9998e+04
test_data and test_targets are also created in a similar fashion, with one difference: it goes till 8000 with the step of 4.
test_data = np.array([[2.0,2.0]])
test_targets = np.array([4.0])
for i in range(4,8000,4):
test_data = np.append(test_data,[[i,i]],axis=0)
test_targets = np.append(test_targets,[i+i])
test_data:
Output
[2.000e+00 2.000e+00]
[4.000e+00 4.000e+00]
[8.000e+00 8.000e+00]
...
[7.988e+03 7.988e+03]
[7.992e+03 7.992e+03]
[7.996e+03 7.996e+03]
test_targets:
Output
4.0000e+00 8.0000e+00 1.6000e+01 ... 1.5976e+04 1.5984e+04 1.5992e+04
Developing Neural Network for Addition Using Keras
Keras is an API spec that can be used to run various deep learning libraries e.g. Tensorflow, Theano, etc. It is to be noted that Keras does not have an implementation and it is a high-level API that runs on top of other deep learning libraries. The problem we are attempting to solve is a regression problem where the output can be a continuum of values rather than taking a specified set of values. Below, the program creates a Deep Learning model, trains it using the training set we created in the data_creation.py program, and then tests it using the test set also created in the same program. Finally, the trained model is used to predict the values.
import tensorflow as tf
from tensorflow import keras
import numpy as np
import data_creation as dc
model = keras.Sequential([
keras.layers.Flatten(input_shape=(2,)),
keras.layers.Dense(20, activation=tf.nn.relu),
keras.layers.Dense(20, activation=tf.nn.relu),
keras.layers.Dense(1)
])
model.compile(optimizer='adam',
loss='mse',
metrics=['mae'])
model.fit(dc.train_data, dc.train_targets, epochs=10, batch_size=1)
test_loss, test_acc = model.evaluate(dc.test_data, dc.test_targets)
print('Test accuracy:', test_acc)
a= np.array([[2000,3000],[4,5]])
print(model.predict(a))
Let's analyze the above program by breaking it into small chunks:
import tensorflow as tf
from tensorflow import keras
import numpy as np
import data_creation as dc
The above lines import the Tensorflow, Keras, and Numpy libraries in the program. Also, the data_creation.py program that we created earlier is also imported and given a named variable as dc. All the trained test data sets we created can now be referenced using the dc. For example, if the user needs to use the contents of train_data then all she has to do is use dc.train_data to access it.
model = keras.Sequential([
keras.layers.Flatten(input_shape=(2,)),
keras.layers.Dense(20, activation=tf.nn.relu),
keras.layers.Dense(20, activation=tf.nn.relu),
keras.layers.Dense(1)
])
The above code creates the actual Deep Learning model. The above model initializes a model as a stack of layers (Keras.Sequential) and then flattens the input array to a vector (keras.layers.Flatten(input_shape=(2,)). The flattening part also happens to be the first layer of the neural network. The second and third layers of the network consist of 20 nodes each and the activation function we are using is relu (rectified linear unit). Other activation functions such as softmax can also be used. The last layer, fourth layer, is the output layer. Since we expect only one output value (a predicted value since this is a regression model), we have just one output node in this model (keras.layers.Dense(1)).
The architecture of the model depends, to a large extent, on the problem we are trying to solve. The model we have created above will not work very well for the classification problems, such as image classification.
model.compile(optimizer='adam',
loss='mse',
metrics=['mae'])
The above code will be used to compile the network. The optimization function we are using is adam which is a momentum based optimizer and prevents the model from getting stuck in local minima. The loss function we are using is mse (mean square error). It considers the squared difference between the predicted values and the actual values. Also, we are monitoring another metric, mae (mean absolute error).
model.fit(dc.train_data, dc.train_targets, epochs=10, batch_size=1)
This is where the actual training of the networks happens. The training set will be fed to the network 10 times (epochs) for the training purpose. The epoch needs to be carefully selected as a lesser number of epochs may lead to an under-trained network while too many epochs may lead to overfitting, wherein the network works well on the training data but not on the test data set.
test_loss, test_acc = model.evaluate(dc.test_data, dc.test_targets)
print('Test accuracy:', test_acc)
The above code evaluates the trained model on the test data set and subsequently prints the test accuracy value.
a= np.array([[2000,3000],[4,5]])
print(model.predict(a))
Once the model has been trained and tested we can use it to predict the values by supplying real-world values. In this case, we are supplying the 2 sets of values (2000,30000) and (4,5) and the output from the model is printed.
Output
Epoch 1/10
5000/5000 [==============================] - 5s 997us/sample - loss: 1896071.4827 - mean_absolute_error: 219.0276
Epoch 2/10
5000/5000 [==============================] - 5s 956us/sample - loss: 492.9092 - mean_absolute_error: 3.8202
Epoch 3/10
5000/5000 [==============================] - 5s 1ms/sample - loss: 999.7580 - mean_absolute_error: 7.1740
Epoch 4/10
5000/5000 [==============================] - 5s 1ms/sample - loss: 731.0374 - mean_absolute_error: 6.0325
Epoch 5/10
5000/5000 [==============================] - 5s 935us/sample - loss: 648.6434 - mean_absolute_error: 7.5037
Epoch 6/10
5000/5000 [==============================] - 5s 942us/sample - loss: 603.1096 - mean_absolute_error: 7.7574
Epoch 7/10
5000/5000 [==============================] - 5s 1ms/sample - loss: 596.2445 - mean_absolute_error: 5.1727
Epoch 8/10
5000/5000 [==============================] - 5s 924us/sample - loss: 685.5327 - mean_absolute_error: 4.9312
Epoch 9/10
5000/5000 [==============================] - 5s 931us/sample - loss: 1895.0845 - mean_absolute_error: 5.7679
Epoch 10/10
5000/5000 [==============================] - 5s 996us/sample - loss: 365.9733 - mean_absolute_error: 2.7120
2000/2000 [==============================] - 0s 42us/sample - loss: 5.8080 - mean_absolute_error: 2.0810
Test accuracy: 2.0810156
[[5095.9385 ]
[ 9.108022]]
As can be seen, the value predicted for the input set (2000,3000) is 5095.9385 and for input set (4,5) it is 9.108022. This can be optimized by changing the epochs or by increasing the layers or increasing the number of nodes in a layer.
Conclusion
It is to be noted that the example we have used, i.e. Addition of two numbers, is just for indicative purpose and it is not the best use of neural networks. However, it is easy to understand and develop an intuitive understanding of neural networks and how a traditional computational problem, such as Addition, can be handled in the Deep Learning world. Deep learning is a black box based model of problem-solving, so the results change with the different parameters. The user needs to get familiarized with the different parameters and how to play around with them to develop an intuitive understanding of what parameters work for a problem at hand. The more a user practices deep learning and works with different problems, the better the understanding she is going to develop.