[Related Article: Generating Neural Networks to Detect Alzheimer’s]
Let’s see a bit of theory about deep learning and CNNs.
It is a machine learning algorithm, which is built on the principle of the organization and functioning of biological neural networks. This concept arose in an attempt to simulate the processes occurring in the brain by Warren McCulloch and Walter Pitts in 1943.
Neural networks consist of individual units called neurons. Neurons are located in a series of groups — layers (see figure below). Neurons in each layer are connected to neurons of the next layer. Data comes from the input layer to the output layer along with these compounds. Each individual node performs a simple mathematical calculation. Тhen it transmits its data to all the nodes it is connected to.
The last wave of neural networks came in connection with the increase in computing power and the accumulation of experience. The advancements in Computer Vision with Deep Learning has been constructed and perfected with time, primarily over one particular algorithm — a Convolutional Neural Network.
Convolutional Neural Networks (CNNs)
A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm which can take in an input image, assign importance (learnable weights and biases) to various aspects/objects in the image and be able to differentiate one from the other. The pre-processing required in a ConvNet is much lower as compared to other classification algorithms. While in primitive methods filters are hand-engineered, with enough training, ConvNets have the ability to learn these filters/characteristics.
The architecture of a ConvNet is analogous to that of the connectivity pattern of Neurons in the Human Brain and was inspired by the organization of the Visual Cortex. Individual neurons respond to stimuli only in a restricted region of the visual field known as the Receptive Field. A collection of such fields overlap to cover the entire visual area.
How do computers see?
For a computer, an image is just an array of values. Typically it’s a 3-dimensional (RGB) matrix of pixel values.
For example, a 4*4 RGB abstract image representation would look like this.
Where each pixel has a specific value of red, green and blue that represents the color of a given pixel.
CNN processes images using matrixes of weights called filters (features) that detect specific attributes such as vertical edges, horizontal edges, etc. Moreover, as the image progresses through each layer, the filters are able to recognize more complex attributes. The ultimate goal of CNN is to detect what is going on in the scene.
Let’s go step by step and analyze each layer in the Convolutional Neural Network.
Input- A Matrix of pixel values in the shape of [width, height, channels].
Convolution- The ultimate purpose of this layer is to receive a feature map. Usually, we start with a low number of filters for low-level feature detection. The deeper we go into the CNN, the more filters (usually they are also smaller) we use to detect high-level features.
Pooling Layer- Similar to the Convolutional Layer, the Pooling layer is responsible for reducing the spatial size of the Convolved Feature. This is to decrease the computational power required to process the data through dimensionality reduction. Furthermore, it is useful for extracting dominant features which are rotational and positional invariant, thus maintaining the process of effectively training of the model.
There are two types of Pooling: Max Pooling and Average Pooling. Max Pooling returns the maximum value from the portion of the image covered by the Kernel. On the other hand, Average Pooling returns the average of all the values from the portion of the image covered by the Kernel.
Activation- Without going into further details, we will use ReLU activation function that returns 0 for every negative value in the input image while it returns the same value for every positive value.
Fully Connected Layer (FC Layer)- In a fully connected layer, we flatten the output of the last convolution layer and connect every node of the current layer with the other node of the next layer. Neurons in a fully connected layer have full connections to all activations in the previous layer, as seen in regular Neural Networks and work in a similar way.
The last layer of our CNN will compute the class probability scores, resulting in a volume of size [1 * 1 * number of classes].
There are various architectures of CNNs available which have been key in building algorithms which power and shall power AI as a whole in the foreseeable future. Some of them have been listed below:
Here, we are going to use VGG16 Model. Its Key Characteristics are:
1. This network contains a total of 16 layers in which weights and bias parameters are learned.
2. A total of 13 convolutional layers are stacked one after the other and 3 dense layers for classification.
3. The number of filters in the convolution layers follow an increasing pattern (similar to decoder architecture of autoencoder).
4. The informative features are obtained by max-pooling layers applied at different steps in the architecture.
5. The dense layers comprise of 4096, 4096, and 1000 nodes each.
6. The cons of this architecture are that it is slow to train and produces the model with a very large size.
Image Classification — Is it a male or a female?
The ultimate goal of this project is to create a system that can detect males and females. While our goal is very specific, ImageClassifier can detect anything that is tangible with an adequate dataset.
Image Classifier is implemented in Python Jupyter Notebook that is available below.
It’s also Google Colaboratory compatible! Just run this notebook in the Colab’s workspace, set GOOGLE_COLAB = True and mount your dataset. Don’t forget to enable the free GPU acceleration!
Dataset: Gender and Age Prediction
Description: Binary classification (Males and Females)
Training: 110 images
Testing: 6 images
Now let’s start with our modeling process
Step 1: Creating a new Notebook
Click on the link below to visit colab and click on File, then New Python 3 Notebook.
Step 2: The second step is to import dependencies/libraries we are going to use in this demo:
import numpy, matplotlib, and Keras as given in the snippet below:
Step 3: Preparing Data
3.1 Preparing label data using results.csv file
a. Loading labels for each image
b. Separating labels for male which is given by 0
c. Splitting male_data into test and train
d. Separating labels for female which is given by 1
e. Splitting female_data into test and train
f. Sample Image
g. Combining test_male_data and test_female_data and creating a finaltest_data DataFrame
h. Filtering train_data from labels by dropping test_data
i. Let’s see the count of the number of male and female images
3.2 Preparing Image Files
In this step, we are resizing the images to 64×64 to run efficiently. Also, we are splitting male and female images for exploratory analysis.
a. Storing the path of each image files in a list
b. Processing images and converting them to numpy array form
c. Displaying sample images
A quick side-by-side comparison of female and male image.
c. Splitting path of male and female images
Step 4. Creating a VGG 16 model for training it on male and female data
There are no silver bullets in terms of the CNN architecture design. The best way to find a model that’s appropriate for a specific case is to start with some basic design and iteratively improve it.
Let’s begin with a very simple and minimalistic model of the VGG-16, with a few notable changes.
- The number of convolution filters cut in half, fully connected (dense) layers scaled down.
- Optimizer changed to
- Output layer activation set to
sigmoidfor binary cross-entropy.
Step 5. Setting up several parameters like loss history, early stopping, etc
Here, I’m using Keras’s early stopping callback to end training when the validation loss stops improving, otherwise, the model will overfit. I will also be tracking the loss history on each epoch to visualize the overfitting trend.
Step 6. Training Model
Step 7. Predicting Output
Step 8. Plotting Training and Validation Loss
Step 9. Let’s see how well our model performed
As we can see above, our model learned to successfully identify both males and females. Its predictions are very confident in the majority of the cases and they are lower only for anything that’s unusual. It’s definitely expected behavior because our model has been trained on less amount of such uncommon data.
[Related Article: Image Augmentation for Convolutional Neural Networks]
We’ve learned the underlying concepts of how computers see by implementing simple yet very powerful image classification system. Possibilities of improvement in this field and Computer Vision, in general, is boundless, and I encourage you to dive into it and implement such solutions on your own.
In results.csv, there is one more ‘Age’ column which we dropped in starting. Check how many different categories are in ‘Age’ and build a CNN model for predicting age group using the same dataset. Also, try improving the current model and share your results/notebook links in comments.
Originally Posted Here