Why are Convnets Often Better Than the Rest? Part III Why are Convnets Often Better Than the Rest? Part III
This is the third part in a series on Convnets. Read the first part here and the second part here. Shared weights The shared weight... Why are Convnets Often Better Than the Rest? Part III

This is the third part in a series on Convnets. Read the first part here and the second part here.

Shared weights

The shared weight design is exactly why convolutional nets (Convnets for short) are good at detecting the same features in different parts of an image. When I mentioned shared weights in the last article, I literally meant that for each matrix, the same group of weights are used for each local receptive field. This means all the neurons in each following layer use the same weight values.

To demonstrate this visually, I have assigned each weight a color. Only four weights are used in the following network, but multiple times:

This design means the second layer can only detect one feature of an image, but it can detect this feature in any part of the image. Of course, we need it to detect many more features than one. From now on, I am going to refer to this second layer as a feature map as it uses shared weights to observe the same feature in all different parts of  the original image.

In order to detect more features, we need to add more feature maps. That’s why convnets are considered to have 3 dimensions:

Now, each layer is made up of multiple feature maps, which each have their own shared group of weights. In real examples, convnets can use many feature maps, especially on the later layers where more specific detail is observed.

Pooling

In traditional neural networks, nothing is simplified. Once an image is fed forward, it will attempt to observe everything. It is more complex for a machine to observe the image on the top because it has many unnecessary details. Downscaling an image makes it dramatically easier and faster for a neural network to interpret data.

The process of simplifying a feature map is called pooling, but doing this too much can result in removing details that do matter.

Pooling is integrated into a convolutional net as a layer, usually just after a convolution layer (described earlier). When applied, a layer using pooling will produce feature maps, where each neuron is a summary of a local receptive field on the previous layer. The most common way of doing this, known as max-pooling, is just choosing the neuron in the local receptive field that has the highest activated value. Here is an example:

Although this is the most common method, you can instead use the following equation:

This is known as L2 pooling, and essentially square roots the sum of each neuron in the local receptive field. It is the same idea as max-pooling, in that it attempts to summarize a local receptive field.

 

All in all, there are three main designs that separate convolutional nets from traditional ones. There are fields of neurons instead of individual ones, which let us observe grouped features. There are shared weights, and those allow the same features to be observed in different areas of an image. Finally, we can simplify data to prevent unnecessary observations.

Caspar Wylie, ODSC

Caspar Wylie, ODSC

My name is Caspar Wylie, and I have been passionately computer programming for as long as I can remember. I am currently a teenager, 17, and have taught myself to write code with initial help from an employee at Google in Mountain View California, who truly motivated me. I program everyday and am always putting new ideas into perspective. I try to keep a good balance between jobs and personal projects in order to advance my research and understanding. My interest in computers started with very basic electronic engineering when I was only 6, before I then moved on to software development at the age of about 8. Since, I have experimented with many different areas of computing, from web security to computer vision.

Open Data Science - Your News Source for AI, Machine Learning & more