Article on Azure ML by Bethany Jepchumba and Josh Ndemenge of Microsoft
In this article, I will cover how you can train a model using Notebooks in Azure Machine Learning Studio. To get the data, you will need to follow the instructions in the article: Create a Data Solution on Azure Synapse Analytics with Snapshot Serengeti – Part 1 – Microsoft Community Hub, where you will load data into Azure Data Lake via Azure Synapse.
At the end of this article, you will learn how to use Pytorch pretrained DenseNet 201 model to classify different animals into 48 distinct categories. Let us get started!
- Azure subscription
- Basic python knowledge
Azure ML (Machine Learning) Workspace
Azure ML is a platform for all your machine learning and deep learning needs. Using Azure ML, you can train your model in three ways:
- Automated ML: This is where you upload your data and have the workspace automatically train on your behalf. When uploading your data, you specify the Machine Learning type, test, and training data before training. Once the training is done, you will receive several models with their performance outlined. It comes in handy when you want to save time and quickly determine which is the best model for your dataset.
- Designer: This is where you use a drag-and-drop interface to prepare your data and train your model. In Designer, you do not need to write a single line of code.
- Notebooks: In notebooks, you write your code in either python or R and run your experiments. With Azure ML you get a wide range of compute options, and you can train large datasets efficiently.
- First, before we start training, you will need to upload your data. You will go to Data on the left-hand side navigation and create a new datastore.
- To access the data, you can specify either Azure Blob Storage or Azure Data Lake Storage 2.
- Next, add ‘snapshot_serengeti’ as the datastore name and add your credentials to gain access.
- Lastly, upload the data from Azure Subscription.
Once the Datastore is created, go ahead, and create a Dataset from the Datastore. Name your dataset ‘<a unique name>,’ select dataset type as file and the path images/. Once done, go ahead and create your dataset.
Load and Train your dataset
Before you begin training your model, you will need to add a new compute. Select a memory-optimized Standard_DS12_v2 compute and give it a unique name.
Once you have created your compute, head over to Azure ML Notebooks to write the code.
Step 1: Installing the necessary libraries
Once on the notebook, the first thing is ensuring that we have the necessary libraries installed and imported. We will be using Azure Identity and Azure AI (artificial intelligence) ML in our notebook and pytorch in the script.
Step 2: Connect to your workspace
In this step, you will need to connect to your Azure ML workspace which provides a central place to work with all the resources you create using Azure ML. To create the workspace, we will need DefaultAzureCredential to sign in and authenticate. You will need the code below to get a handle on the workspace:
Step 3: Define your compute target and environment
We had earlier created a compute resource where our code runs. We will need to add the compute as it will be used to run our different machine-learning jobs. In our notebook, we will just need to define the compute target and add it to a variable as: compute_target = “<compute name>”
Next, we will also add our environment. An environment is where the dependencies needed to run our script are stored. You can create your own or use an already pre-existing environment such as: azureml://registries/azureml/environments/AzureML-ACPT-pytorch-1.11-py38-cuda11.3-gpu/versions/11
Step 4: Configure your training Script
In Azure ML, training scripts are used to define the steps for training an ML model. The script outlines how you will get and load your data, transform the data, load the pretrained DenseNet 201 model, define the loss function, train the model, and finally save the model. I created the training script, train.py, using GitHub Copilot and Visual Studio code, you can watch the full video below:
Step 5: Building the training job
We have now all the items required to run our job. We will be creating a command, a resource that specifies our training details. Here, we will add our inputs, compute, environment, and script and add an experiment name to our job. We will be using the variables we have already created and link our script. Once done, we will then create and submit our job. The code will be as follows:
Step 6: Hyperparameter tuning and defining the termination policy.
Before we run the jobs we have just submitted, we will try and see if we can improve our model performance through hyperparameter tuning. To tune our model, we will use Azure ML’s sweep capabilities.
Additionally, we will also define an early termination policy, BanditPolicy, to terminate runs that perform poorly early. This policy will be applied to every epoch. Lastly, we will apply the same job as before accompanied by a sweeping job that sweeps over the training job.
Next Steps model deployment and testing
In the next article, we will talk about how to find the best model based on various metrics, test its performance on new data, and prepare the model for deployment. In the meantime, you can refer to the resource below for more information:
– Train deep learning PyTorch models ) – Azure Machine Learning
– Train with machine learning datasets – Azure Machine Learning
– You can find the code at: BethanyJep/SerengetiLabML
– Learn Azure Synapse Analytics
– Learning Paths