The shared weight design is exactly why convolutional nets (Convnets for short) are good at detecting the same features in different parts of an image. When I mentioned shared weights in the last article, I literally meant that for each matrix, the same group of weights are used for each local receptive field. This means all the neurons in each following layer use the same weight values.
To demonstrate this visually, I have assigned each weight a color. Only four weights are used in the following network, but multiple times:
This design means the second layer can only detect one feature of an image, but it can detect this feature in any part of the image. Of course, we need it to detect many more features than one. From now on, I am going to refer to this second layer as a feature map as it uses shared weights to observe the same feature in all different parts of the original image.
In order to detect more features, we need to add more feature maps. That’s why convnets are considered to have 3 dimensions:
Now, each layer is made up of multiple feature maps, which each have their own shared group of weights. In real examples, convnets can use many feature maps, especially on the later layers where more specific detail is observed.
In traditional neural networks, nothing is simplified. Once an image is fed forward, it will attempt to observe everything. It is more complex for a machine to observe the image on the top because it has many unnecessary details. Downscaling an image makes it dramatically easier and faster for a neural network to interpret data.
The process of simplifying a feature map is called pooling, but doing this too much can result in removing details that do matter.
Pooling is integrated into a convolutional net as a layer, usually just after a convolution layer (described earlier). When applied, a layer using pooling will produce feature maps, where each neuron is a summary of a local receptive field on the previous layer. The most common way of doing this, known as max-pooling, is just choosing the neuron in the local receptive field that has the highest activated value. Here is an example:
Although this is the most common method, you can instead use the following equation:
This is known as L2 pooling, and essentially square roots the sum of each neuron in the local receptive field. It is the same idea as max-pooling, in that it attempts to summarize a local receptive field.
All in all, there are three main designs that separate convolutional nets from traditional ones. There are fields of neurons instead of individual ones, which let us observe grouped features. There are shared weights, and those allow the same features to be observed in different areas of an image. Finally, we can simplify data to prevent unnecessary observations.