In my other articles, I have discussed the many different neural network hyper parameters that contribute to optimal success. While hyper parameters are crucial for training successful algorithms, the importance of neural network bias values are not to be forgotten as well. In this article I’ll delve into the the what, why, and how of neural network bias values.
The Requirements of Neural Network Bias Values in Algorithms
Neural networks are designed to handle a huge range of values composed of completely different variations and scales. A generalized version of a neural network might not always be calibrated correctly for this large range.
When looking at one neuron, which takes all neuron values of the previous layer, multiplied by their respective weights, and summed to produce a new value, we need to consider the potential this value has to be correct. For example:
n5 = A((n1 * w1) + (n2 * w2) + (n3 * w3) + (n4 * w4))
n6 = A((n1 * w5) + (n2 * w6) + (n3 * w7) + (n4 * w8))
n7 = A((n1 * w9) + (n2 * w10) + (n3 * w11) + (n4 * w12))
Where A() is an activation function, n5,n6 and n7 may have values that are naturally bias, therefore producing values that are activated too far to the left or right. Without a bias neuron, an activation function (such as sigmoid) will appear neutral:
When an activation function appears neutral, the summed value taken from the feedforward algorithm will be activated perfectly, and produce directly proportional squashed results. Directly proportional results can be problematic because resulting values may not actually correlate correctly to the problem in question. As such, we must counteract them with a responding bias neuron.
Implementation of Bias Values in Neural Network Algorithms
This bias neuron should calibrate the activation function in such a way that it starts to produce the desired results. Referring back to our bias neuron example, the results should look like this:
n5 = A((n1 * w1) + (n2 * w2) + (n3 * w3) + (n4 * w4) + (1 * b1))
n6 = A((n1 * w5) + (n2 * w6) + (n3 * w7) + (n4 * w8)+ (1 * b2))
n7 = A((n1 * w9) + (n2 * w10) + (n3 * w11) + (n4 * w12) + (1 *b3))
Where b is the bias weight value in the algorithm; you will notice the bias works pretty much like a normal neuron, except for the fact it will have a constant value of (usually) 1. The different bias weight values can now affect the activation function directly, and calibrate outputs:
The bias is essentially the c in the famous y = mx + c equation, because it is a constant that affects the y-intercept of a curve. This gives a single perceptron much greater flexibility. If we had two binary inputs to a neuron, both of which were 0, the activated output (assuming sigmoid) would have to be 0.5 (because S(0) = 0.5, bearing in mind weight values are useless at this point). However, with a constant 1 multiplied by its weight always being inputted to A(), we can always gather greater insight because of this bias weight (with A(x), x will not be 0).
To conclude, neural network bias values are important because they shift activation functions from either left to right to counteract results that otherwise do not reflect the actual problem at hand. This enables the neural network to correctly understand the problem thus enabling humans to agree with the results of the neural network produced. Sometimes we need to push a neural network in the right direction, and that is exactly what neural network bias values are for.
Read more about data science here.