Implement Hyperparameter Tuning for Tensorflow2.0
Jul 31, 2020 • 10 Minute Read
Introduction
Remember how you used to tune the radio to improve the channel bandwidth for better sound quality and less background noise?
Similarly, in machine learning (ML), you can improve the accuracy of a model (learning algorithm) by tuning hyperparameters, such as the learning rate. Hyperparameters are the parameters whose values are tuned to obtain optimal performance for a model.
Hyperparameter tuning is also known as hyperparameter optimization. Most programmers use exhaustive manual search, which has higher computation cost and is less interactive. TensorFlow 2.0 introduced the TensorBoard HParams dashboard to save time and get better visualization in the notebook.
Model optimization is a continuous process, as shown in the image below:
This guide will use the inbuilt MNIST dataset, which can easily be loaded from the Keras API database. But before jumping into implementation let's get familiar with some terms.
What is a Hyperparameter?
In neural network (NN) design, hyperparameter values help the model find weights of a node to understand the pattern of an image, text, or speech more accurately. Their value is set before the training process and doesn't change during the training process.
You can tune values for the following hyperparameters:
-
Number of units and nodes in the dense layer.
-
Learning rate. This controls how quickly the model adapts to the problem. At each iteration, it will determine the step size while moving towards a minimum loss function. Range is between 0.0 and 1.0.
-
Dropout layer. Dropout gives the probability of training a given node in the layer.
-
Optimizer. To reduce loss and get results faster, an optimizer changes the weights and learning rate of a NN. Adam, SDG, rmsprop, and nadam are some of the most commonly used optimizers.
-
L2 regularization. This chooses weights of small magnitude for the model to give a non-spare solution. Regularization is the sum of the square of all feature weights. Lambda is the hyperparameter tuned to strike the balance between simplicity and training-data fit.
This can improve your NN performance by reducing overfitting.
-
Epochs. This defines the amount of time that the learning algorithm will take to run through the entire training set. For example, MNIST has 60,000 images, so one epoch means going through all 60,000 images at once.
-
Activation functions. These introduce non-linearity into the output of the neurons. Some examples are given in the image below.
TensorBoard HParams Dashboard
Often in TensorFlow, while training a model, you just have the screen outputs displaying performance metrics. You can hardly track how the model achieves. To make it easier to understand, optimize, and debug TF programs, TF2.0 has introduced TensorBoard.
TensorBoard helps you visualize TF graphs, plot quantitative metrics, etc. This guide will focus on hyperparameter values using the HParams dashboard. The following steps in the HParams dashboard tools will help you identify the best practices to optimize a set of hyperparameters:
- Experiment setup and HParams summary
- Adapt TensorFlow runs to log hyperparameters and metrics
- Start runs and log them all under one parent directory
- Visualize the results in TensorBoard's HParams dashboard
Code Implementation
Pre-requisites
Start by installing TF 2.0 and loading the TensorBoard notebook extension:
%load_ext tensorboard
Clear any logs from previous runs:
!rm -rf ./logs/
Import TensorFlow and the TensorBoard HParams plugin:
import tensorflow as tf
from tensorboard.plugins.hparams import api as hp
Download the MNIST dataset and scale it:
mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
1. Experiment Setup and HParams Experiment Summary
Experiment with four hyperparameters: in the model:
- Number of units in the first dense layer
- Dropout rate in dropout layer
- Optimizer
- L2 Regularizer
HP_NUM_UNITS = hp.HParam('num_units', hp.Discrete([256, 512]))
HP_DROPOUT = hp.HParam('dropout', hp.RealInterval(0.5, 0.6)
HP_OPTIMIZER = hp.HParam('optimizer', hp.Discrete(['adam','sgd','rmsprop']))
HP_L2 = hp.HParam('l2 regularizer', hp.RealInterval(.001,.01))
METRIC_ACCURACY = 'accuracy'
with tf.summary.create_file_writer('logs/hparam_tuning').as_default():
hp.hparams_config(
hparams=[HP_NUM_UNITS, HP_DROPOUT, HP_OPTIMIZER,HP_L2],
metrics=[hp.Metric(METRIC_ACCURACY, display_name='Accuracy')],
)
2. Adapt TensorFlow Runs to Log Hyperparameters and Metrics
The model contains two dense layers with a dropout layer between them. The hyperparameters are not hardcoded, although the training code will be similar. All the hyperparameters are provided in the hparams dictionary.
def train_test_model(hparams):
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(hparams[HP_NUM_UNITS], kernel_regularizer=tf.keras.regularizers.l2(0.001), activation=tf.nn.relu),
tf.keras.layers.Dropout(hparams[HP_DROPOUT]),
tf.keras.layers.Dense(10, activation=tf.nn.softmax),
])
model.compile(
optimizer=hparams[HP_OPTIMIZER],
loss='sparse_categorical_crossentropy',
metrics=['accuracy'],
)
model.fit(x_train, y_train, epochs=2)
_, accuracy = model.evaluate(x_test, y_test)
return accuracy
For each run, log an HParams summary with the hyperparameters and final accuracy:
def run(run_dir, hparams):
with tf.summary.create_file_writer(run_dir).as_default():
hp.hparams(hparams) # record the values used in this trial
accuracy = train_test_model(hparams)
tf.summary.scalar(METRIC_ACCURACY, accuracy, step=2)
3. Start Runs and Log them All Under One Parent Directory
You can now try multiple experiments, training each one with a different set of hyperparameters.
session_num = 0
for num_units in HP_NUM_UNITS.domain.values:
for dropout_rate in (HP_DROPOUT.domain.min_value, HP_DROPOUT.domain.max_value):
for l2 in (HP_L2.domain.min_value, HP_L2.domain.max_value):
for optimizer in HP_OPTIMIZER.domain.values:
hparams = {
HP_NUM_UNITS: num_units,
HP_DROPOUT: dropout_rate,
HP_L2: l2,
HP_OPTIMIZER: optimizer,
}
run_name = "run-%d" % session_num
print('--- Starting trial: %s' % run_name)
print({h.name: hparams[h] for h in hparams})
run('logs/hparam_tuning/' + run_name, hparams)
session_num += 1
4. Visualize the Results in TensorBoard's HParams Dashboard
Open the HParams Dashboard. Once TensorBoard starts, click HParams at the top.
%tensorboard --logdir logs/hparam_tuning
Table View
Table View lists the name of the session and performance metrics of the hyperparameters. The square checkboxes allow you to limit the view of the metrics.
Parallel Coordinate View
This view displays a run as a line (color-coded) that passes through the axis of each hyperparameter, and metrics at the end show the accuracy. It is important to know which set of hyperparameters is more important. If you place the mouse pointer on any axis, the run that passes through will get highlighted. You can reorder the axes by dragging them.
Scatter Plot View
This view is used to identify the correlation between each metric. Click or hover over a session group to highlight the session across plots.
Conclusion
Sorting the accuracy in descending order shows that the most optimized model has 512 units with a dropout rate of 0.5 and Adam optimizer with an L2 regularization rate of 0.01 and accuracy of 95.710%. The model can be optimized further. You can include more performance metrics for better visualization and understanding.
This guide gave a brief introduction to TensorBoard. TensorBoard's HParams dashboard provides amazing visualization to help you understand which hyperparameter can be further fine-tuned to make your NN model more accurate and reliable.
You can explore other TensorBoard features like graphs, projector, etc., here.
I hope you enjoyed learning. If you have any queries, feel free to contact me at CodeAlphabet.