Image Augmentation for Convolutional Neural Networks
Machine LearningModelingPythonConvolutional Neural NetworksMachine LearningPythonposted by Nathaniel Jermain June 26, 2019 Nathaniel Jermain
Limited data is a major obstacle in applying deep learning models like convolutional neural networks. Often, imbalanced classes can be an additional hindrance; while there may be sufficient data for some classes, equally important, but undersampled classes will suffer from poor class-specific accuracy. This phenomenon is intuitive. If the model learns from a few examples of a given class, it is less likely to predict the class invalidation and test applications. There are many ways to address complications associated with limited data in machine learning. Image augmentation is one useful technique in building convolutional neural networks that can increase the size of the training set without acquiring new images.
[Related Article: Building a Custom Convolutional Neural Network in Keras]
The idea is simple; duplicate images with some kind of variation so the model can learn from more examples. Ideally, we can augment the image in a way that preserves the features key to making predictions, but rearranges the pixels enough that it adds some noise. Augmentation will be counterproductive if it produces images very dissimilar to what the model will be tested on, so this process must be executed with care.
For those using Keras, there is a handy group of arguments in “ImageDataGenerator” that allows for image augmentation on the fly. The disadvantage of using Keras’ functions is that users cannot specify exactly what classes to augment, and there are limited augmentation options. We’ll touch on using both approaches, starting with manual image augmentation.
First, import dependencies associated with reading and processing images in Python.
Say we have a limited number of golden retriever images for our network classifying dog breeds. First, it’s necessary to read in one of the golden retriever images and resize it to dimensions appropriate for the convolutional neural network (we’ll say 400 by 400). It’s worthwhile checking to make sure the reduction in size didn’t distort the image too much (a little is OK).
By flipping the image horizontally or vertically, we can completely rearrange the pixels, but the features are preserved. This can be achieved in numpy.
For the case of predicting dog breeds, vertical flips may be counterproductive because the model is unlikely to be tested on any images of dogs upside down. While dogs in images are unlikely to be upside down, they may be at a variety of angles. We can rotate each image by a random degree. The image will be distorted a bit, but that is OK.
The above rotation may not seem like much of a change, but imagine how the computer perceives images; most pixel values for the rotated image are now different values from those in the original image.
It can also be useful to add noise to the image in the form of erroneous pixel values randomly dispersed throughout the image. “Random_noise” allows users to assign “0” or “1” to pixel values at random.
Classic augmentation techniques like flips and rotations can be applied to each image in the training set without manually processing each image. “ImageDataGenerator” draws batches of images from the directory and applies transformations such as “vertical_flip”, “horizontal_flip” or “rotation_range”. If there are balanced classes, this is a good option. One can learn more about “ImageDataGenerator” in Keras here.
[Related Article: Deep Learning with Reinforcement Training]
These augmentations can be combined to make many variants of the original image. How much improvement in prediction accuracy will each augmentation technique cause? Impossible to tell without trying a variety of them. It is often a good idea to engineer different training sets with varying augmentation techniques and see which one improves performance the most. It is important to note that validation and test sets should not be augmented; the point of the validation and test sets is to evaluate the model’s performance in a realistic application and duplicated images may artificially boost performance measures.