Implement Hyperparameter Tuning for Tensorflow2.0

By Gaurav Singhal

Jul 31, 2020 • 10 Minute Read

Introduction

Remember how you used to tune the radio to improve the channel bandwidth for better sound quality and less background noise?

Similarly, in machine learning (ML), you can improve the accuracy of a model (learning algorithm) by tuning hyperparameters, such as the learning rate. Hyperparameters are the parameters whose values are tuned to obtain optimal performance for a model.
Hyperparameter tuning is also known as hyperparameter optimization. Most programmers use exhaustive manual search, which has higher computation cost and is less interactive. TensorFlow 2.0 introduced the TensorBoard HParams dashboard to save time and get better visualization in the notebook.

Model optimization is a continuous process, as shown in the image below:

This guide will use the inbuilt MNIST dataset, which can easily be loaded from the Keras API database. But before jumping into implementation let's get familiar with some terms.

What is a Hyperparameter?

In neural network (NN) design, hyperparameter values help the model find weights of a node to understand the pattern of an image, text, or speech more accurately. Their value is set before the training process and doesn't change during the training process.

You can tune values for the following hyperparameters:

Number of units and nodes in the dense layer.
Learning rate. This controls how quickly the model adapts to the problem. At each iteration, it will determine the step size while moving towards a minimum loss function. Range is between 0.0 and 1.0.
Dropout layer. Dropout gives the probability of training a given node in the layer.
Optimizer. To reduce loss and get results faster, an optimizer changes the weights and learning rate of a NN. Adam, SDG, rmsprop, and nadam are some of the most commonly used optimizers.
L2 regularization. This chooses weights of small magnitude for the model to give a non-spare solution. Regularization is the sum of the square of all feature weights. Lambda is the hyperparameter tuned to strike the balance between simplicity and training-data fit.

This can improve your NN performance by reducing overfitting.

Epochs. This defines the amount of time that the learning algorithm will take to run through the entire training set. For example, MNIST has 60,000 images, so one epoch means going through all 60,000 images at once.
Activation functions. These introduce non-linearity into the output of the neurons. Some examples are given in the image below.

TensorBoard HParams Dashboard

Often in TensorFlow, while training a model, you just have the screen outputs displaying performance metrics. You can hardly track how the model achieves. To make it easier to understand, optimize, and debug TF programs, TF2.0 has introduced TensorBoard.

TensorBoard helps you visualize TF graphs, plot quantitative metrics, etc. This guide will focus on hyperparameter values using the HParams dashboard. The following steps in the HParams dashboard tools will help you identify the best practices to optimize a set of hyperparameters:

Experiment setup and HParams summary
Adapt TensorFlow runs to log hyperparameters and metrics
Start runs and log them all under one parent directory
Visualize the results in TensorBoard's HParams dashboard

Code Implementation

Pre-requisites

Start by installing TF 2.0 and loading the TensorBoard notebook extension:

      %load_ext tensorboard

Clear any logs from previous runs:

      !rm -rf ./logs/

Import TensorFlow and the TensorBoard HParams plugin:

          import tensorflow as tf
from tensorboard.plugins.hparams import api as hp
    

Download the MNIST dataset and scale it:

          mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
    

1. Experiment Setup and HParams Experiment Summary

Experiment with four hyperparameters: in the model:

Number of units in the first dense layer
Dropout rate in dropout layer
Optimizer
L2 Regularizer

          HP_NUM_UNITS = hp.HParam('num_units', hp.Discrete([256, 512]))
HP_DROPOUT = hp.HParam('dropout', hp.RealInterval(0.5, 0.6)
HP_OPTIMIZER = hp.HParam('optimizer', hp.Discrete(['adam','sgd','rmsprop']))
HP_L2 = hp.HParam('l2 regularizer', hp.RealInterval(.001,.01))

METRIC_ACCURACY = 'accuracy'

with tf.summary.create_file_writer('logs/hparam_tuning').as_default():
  hp.hparams_config(
    hparams=[HP_NUM_UNITS, HP_DROPOUT, HP_OPTIMIZER,HP_L2],
    metrics=[hp.Metric(METRIC_ACCURACY, display_name='Accuracy')],
  )
    

2. Adapt TensorFlow Runs to Log Hyperparameters and Metrics

The model contains two dense layers with a dropout layer between them. The hyperparameters are not hardcoded, although the training code will be similar. All the hyperparameters are provided in the hparams dictionary.

          def train_test_model(hparams):
  model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(hparams[HP_NUM_UNITS], kernel_regularizer=tf.keras.regularizers.l2(0.001), activation=tf.nn.relu),
    tf.keras.layers.Dropout(hparams[HP_DROPOUT]),
    tf.keras.layers.Dense(10, activation=tf.nn.softmax),
  ])
  model.compile(
      optimizer=hparams[HP_OPTIMIZER],
      loss='sparse_categorical_crossentropy',
      metrics=['accuracy'],
  )

  model.fit(x_train, y_train, epochs=2) 
  _, accuracy = model.evaluate(x_test, y_test)
  return accuracy
    

For each run, log an HParams summary with the hyperparameters and final accuracy:

          def run(run_dir, hparams):
  with tf.summary.create_file_writer(run_dir).as_default():
    hp.hparams(hparams)  # record the values used in this trial
    accuracy = train_test_model(hparams)
    tf.summary.scalar(METRIC_ACCURACY, accuracy, step=2)
    

3. Start Runs and Log them All Under One Parent Directory

You can now try multiple experiments, training each one with a different set of hyperparameters.

          session_num = 0

for num_units in HP_NUM_UNITS.domain.values:
  for dropout_rate in (HP_DROPOUT.domain.min_value, HP_DROPOUT.domain.max_value):
    for l2 in (HP_L2.domain.min_value, HP_L2.domain.max_value):
      for optimizer in HP_OPTIMIZER.domain.values:
        hparams = {
            HP_NUM_UNITS: num_units,
            HP_DROPOUT: dropout_rate,
            HP_L2: l2,
            HP_OPTIMIZER: optimizer,
        }
        run_name = "run-%d" % session_num
        print('--- Starting trial: %s' % run_name)
        print({h.name: hparams[h] for h in hparams})
        run('logs/hparam_tuning/' + run_name, hparams)
        session_num += 1
    

4. Visualize the Results in TensorBoard's HParams Dashboard

Open the HParams Dashboard. Once TensorBoard starts, click HParams at the top.

      %tensorboard --logdir logs/hparam_tuning

Table View

Table View lists the name of the session and performance metrics of the hyperparameters. The square checkboxes allow you to limit the view of the metrics.

Parallel Coordinate View

This view displays a run as a line (color-coded) that passes through the axis of each hyperparameter, and metrics at the end show the accuracy. It is important to know which set of hyperparameters is more important. If you place the mouse pointer on any axis, the run that passes through will get highlighted. You can reorder the axes by dragging them.

Scatter Plot View

This view is used to identify the correlation between each metric. Click or hover over a session group to highlight the session across plots.

Conclusion

Sorting the accuracy in descending order shows that the most optimized model has 512 units with a dropout rate of 0.5 and Adam optimizer with an L2 regularization rate of 0.01 and accuracy of 95.710%. The model can be optimized further. You can include more performance metrics for better visualization and understanding.

This guide gave a brief introduction to TensorBoard. TensorBoard's HParams dashboard provides amazing visualization to help you understand which hyperparameter can be further fine-tuned to make your NN model more accurate and reliable.

You can explore other TensorBoard features like graphs, projector, etc., here.

I hope you enjoyed learning. If you have any queries, feel free to contact me at CodeAlphabet.

Gaurav S.

Guarav is a Data Scientist with a strong background in computer science and mathematics. He has extensive research experience in data structures, statistical data analysis, and mathematical modeling. With a solid background in Web development he works with Python, JAVA, Django, HTML, Struts, Hibernate, Vaadin, Web Scrapping, Angular, and React. His data science skills include Python, Matplotlib, Tensorflows, Pandas, Numpy, Keras, CNN, ANN, NLP, Recommenders, Predictive analysis. He has built systems that have used both basic machine learning algorithms and complex deep neural network. He has worked in many data science projects, some of them are product recommendation, user sentiments, twitter bots, information retrieval, predictive analysis, data mining, image segmentation, SVMs, RandomForest etc.

More about this author