Tuning Hyperparameters with Reproducible Experiments Tuning Hyperparameters with Reproducible Experiments
Editor’s note: Milecia McGregor is a speaker for ODSC West 2021. Be sure to check out her talk, “Tuning Hyperparameters with... Tuning Hyperparameters with Reproducible Experiments

Editor’s note: Milecia McGregor is a speaker for ODSC West 2021. Be sure to check out her talk, “Tuning Hyperparameters with Reproducible Experiments,” there!

When you’re starting to build a new machine learning model and you’re deciding on the model architecture, there are a number of issues that arise. You have to monitor code changes you make, note any differences in the data you’ve used for training, and keep up with hyperparameter value updates.

Being able to track all of these changes is important so that you can reproduce your experiments without wondering which changes gave you the best model. You can go back to any point in your experimenting process to see which changes gave you the best results.

In this post, we’re going to go through an example of hyperparameter tuning with reproducibility using DVC. You can add this to any existing project you’re working on or start from a fresh project.

Background on Hyperparameters

Before jumping straight into training and experiments, let’s briefly go over some background on hyperparameters. Hyperparameters are the values that define your model. This includes things like the number of layers in a neural network or the learning rate for gradient descent.

These parameters are different from model parameters because we can’t get them from training our model. They are used to create the model we train with. Optimizing these values means running training steps for different kinds of models to see how accurate the results are. We can get the best model by iterating through different hyperparameter values and seeing how they affect our accuracy.

That’s why we do hyperparameter tuning. There are a couple of common methods that we’ll do some code examples with: grid search and random search.

Tuning Hyperparameters with DVC

Let’s start by talking about DVC a bit because we’ll be using it to add reproducibility to our tuning process. This is the tool we’ll be using to track changes in our data, code, and hyperparameters. With DVC, we can add some automation to the tuning process and be able to find and restore any really good models that emerge.

A few things DVC makes easier to do:

  • Letting you make changes without worrying about finding them later
  • Onboarding other engineers to a project
  • Sharing experiments with other engineers on different machines

For tuning hyperparameters, this means you can play with their values without losing track of which changes made the best model and also have other engineers take a look. We’ll do an example of this with grid search in DVC first.


Working with a DVC project

We’re going to be working with an existing NLP project. You can get the code we’re working within this repo. It already has DVC set up, but you can check out the Get Started docs if you want to know how the DVC pipeline was created.

First make sure you’re in a virtual environment with a command similar to this.

python -m venv .venv

After you’ve cloned the repo, install all of the dependencies with this command.

pip install -r requirements.txt

You should be able to open your terminal and run an experiment with the following command.

dvc exp run

This will trigger the training process to run and it will record the ROC-AUC of your model. You can check out the results of your experiment with the following command.

dvc exp show --no-timestamp --include-params train.n_est,train.min_split

We’re adding a few options here to make the table view clearer. We aren’t showing timestamps and we’re only looking at two hyperparameter values. You can run dvc exp show without the options to see the entire table.

This will produce a table similar to this.

tuning hyperparameters

Start tuning with grid search

Now that you’ve seen how to run an experiment, we’re going to write a small script to automate grid search for us using DVC. Using grid search in hyperparameter tuning means you have an exhaustive list of hyperparameter values you want to cycle through. Grid search will cover every combination of those hyperparameter values.

We’ll do this by creating queues. A queue is how DVC allows us to create experiments that won’t be run until later. That way we can cycle through multiple hyperparameters quickly instead of manually updating a config file with new hyperparameter values for each experiment run. The command syntax for creating queues looks like this:

dvc exp run --queue --set-param train.min_split=8

In the example queue above, we’re updating the train.min_split value that’s inside of the params.yaml file. This file holds all of the hyperparameter values and is where DVC looks to determine if any values have changed. With the command above, we’re automatically updating that value in the params.yaml using a queued experiment.

Now we can make the script. You can add a new file to the src directory called grid_search.py. Inside of the file, add the following code.

import itertools
import subprocess

# Automated grid search experiments
n_est_values = [250, 300, 350, 400, 450, 500]
min_split_values = [8, 16, 32, 64, 128, 256]

# Iterate over all combinations of hyperparameter values.
for n_est, min_split in itertools.product(n_est_values, min_split_values):
    # Execute "dvc exp run --queue --set-param train.n_est=<n_est> --set-param train.min_split=<min_split>".
    subprocess.run(["dvc", "exp", "run", "--queue",
                    "--set-param", f"train.n_est={n_est}",
                    "--set-param", f"train.min_split={min_split}"])

This is a simple grid search. We have two hyperparameters we want to tune: n_est and min_split. So we have arrays with a few values in them to mimic the exhaustive search a grid search can handle. Then we loop through the values and create queued experiments for them using subprocess.

You can run this script now and generate your queue with this command.

python src/grid_search.py

You’ll see some outputs in the terminal telling you that your experiments have been queued. Then you can run them all with the following command.

dvc exp run --run-all

This will run every experiment that has been queued. Once all of those have run, take a look at your metrics for each experiment.

dvc exp show --include-params=train.min_split,train.n_est --no-timestamp

Your table should look similar to this when you run the command above. We’ve included the –include-params and –no-timestamp options to give us a table that’s easier to read.

tuning hyperparameters

Now you can see how your precision changed with each hyperparameter value update. This is a quick implementation of grid search in DVC. You could read the hyperparameter values from a different file or data source or make this tuning script as fancy as you like. The main thing you need is the dvc exp run –queue –set-param <param> command to execute when you add new values.

Random search

Another commonly used method for tuning hyperparameters is random search. This takes random values for hyperparameters and builds the model with them. It usually takes less time than an exhaustive grid search and it can perform better if run for a similar amount of time as a grid search.

We’re going to add an example of random search in a new file called random_search.py similar to the file we created for grid search. This will add queued experiments with the randomly selected hyperparameter values. Add the following code to random_search.py.

import subprocess
import random

# Automated random search experiments
num_exps = 10

for _ in range(num_exps):
    params = {
        "rand_n_est_value": random.randint(250, 500),
        "rand_min_split_value": random.choice([8, 16, 32, 64, 128, 256])
    subprocess.run(["dvc", "exp", "run", "--queue",
                    "--set-param", f"train.n_est={params['rand_n_est_value']}",
                    "--set-param", f"train.min_split={params['rand_min_split_value']}"])

This search could be far more complex with Bayesian optimization to handle the hyperparameter value selections, but we’re keeping it super simple by choosing random numbers to focus on reproducibility. This will generate ten experiments with random values for each hyperparameter.

You can run these new experiments with dvc exp run –run-all and then take a look at the results with dvc exp show –include-params=train.min_split,train.n_est –no-timestamp. Your table should look something like this.

tuning hyperparameters

This shows the difference in the randomly selected values and the values from grid search. You might find a better value with random search because it jumps around a range of values which might hit the optimum faster than it would with a grid search.


With the comparison between grid search and random search, you can see how reproducibility can help you find the best model for your project. You’ll be able to see all of the hyperparameter changes and code changes that created each model. This gives you the ability to fine-tune your model because you can go to any experiment and resume training with different values, code, or data.

Editor’s note – more information on Milecia’s upcoming ODSC West 2021 session, “Tuning Hyperparameters with Reproducible Experiments”: In this workshop, you will learn how you can use the open-source tool, DVC, to compare increase reproducibility for two methods of tuning hyperparameters: grid search and random search. We’ll go through a live demo of setting up and running grid search and random search experiments. By the end of the workshop, you’ll know how to add reproducibility to your existing projects.

About the author/ODSC West 2021 speaker on tuning hyperparameters:

Milecia McGregor is a senior software engineer, international tech speaker, and mad scientist that works with hardware and software. She will try to make anything with JavaScript first. In her free time she enjoys learning random things, like how to ride a unicycle, and playing with her dog.

ODSC Community

The Open Data Science community is passionate and diverse, and we always welcome contributions from data science professionals! All of the articles under this profile are from our community, with individual authors mentioned in the text itself.