Building Neural Networks with Perceptron, One Year Later — Part III Building Neural Networks with Perceptron, One Year Later — Part III
This is the third part in a three-part series. The first part can be read here and the second part here. Inside Perceptron Each neuron in... Building Neural Networks with Perceptron, One Year Later — Part III

This is the third part in a three-part series. The first part can be read here and the second part here.

Inside Perceptron

Each neuron in a neural network will, at some point, have a value. Each weight (the neuron links) will also have a value, all of which the user sets initially as random decimals between a specified range.

Assume we design a structure of 784, 80, 10.

 width=

The Feed Forward Algorithm

The first layer of neurons represent all input values. Then each neuron on the following layer (the hidden layer) takes the sum of all the neurons on the previous layer, multiplied by the weights that connect them to the relevant neuron on the hidden layer. Afterward, it’s important that we activate this summed value.

 width=

For example, assume the i values are set (known inputs):

n1 = (w1 · i1) + (w2 · i2) + (w3 · i3)

Perceptron activates this value to normalise it. A user can do so with many different functions, such as a simple step threshold, the tan function, or the sigmoid function. In most cases, the sigmoid function S(t) suits. It looks like this:

 width=

The formula is:

 S(t) = 1/1+e-t  where (t = n1)

This calculation results in the final value of n1, the first neuron on the second layer. It is important to normalise n1 values because they often vary dramatically. The sigmoid function sets all these values to values between 0 and 1.

Finally, for this example:

n2 = S((w4 · i1) + (w5 · i2) + (w6 · i3))

The more generalised algorithm for the forward feed could be written as the following (for each layer l):

n1 = S (Σl-1 (wil-1))

Where n is a neuron on layer l; w is the weight value on layer l; and i is the value on l-1 layer. Bear in mind I have not named these variables so they completely conform to the example given before. Generalised variables may refer to a matrix of the relevant values.

Perceptron Feed Forward Code

First we set up the first neuron layer with the values from the dataset.

def feed_forward(self, matrix):
       self.populate_input_layer(matrix)

 We then multiply the neurons by the weights, and find the sum. 

for after_input_layer in range(1, len(self.nn_neurons)):
           hidden_neuron_sums = np.dot(np.asarray(self.all_weights[after_input_layer-1]),
                                       self.nn_neurons[after_input_layer-1])

If we have set bias values, we multiply them by their weights too, and add them to our final neuron sums.

if(len(self.biases_weights[after_input_layer-1])!=0):
                               bias_vals = (self.biases_for_non_input_layers[after_input_layer-1]
                                            * self.biases_weights[after_input_layer-1])

Finally, we activate the sum using the sigmoid function.

hidden_neuron_sums += bias_vals
                        self.nn_neurons[after_input_layer] = self.activate_threshold(hidden_neuron_sums,
                                                                                     "sigmoid")

Once we feed each layer forward, the outputs are sets of activated values between 0 and 1, like previous layers. It is important we recognise that the feed forward alone does no learning at all. At first, these outputs will seem random and meaningless. It’s the back propagation that handles learning when it corrects the mistakes the feed forward makes.

The Back Propagation Algorithm

The feed forward could be seen as guessing. Meanwhile, the back propagation educates that guess based on its margin of error. Over time, the guessing will become extremely accurate.

Back propagation finds the margin of error between the outputs and the target value. Using these values, it calculates how each weight contributed to the error and adjusts them accordingly.

The algorithm therefore has to work backward, from the output to all the individual weights.

We can use partial derivatives to calculate the output error with respect to each weight. First, we use the squared cost function to find the error:

E = 1/2(nL – t)2

The target/desired values will be a binary vector for classification. In the handwriting recognition demo, if the network output values were

n= [0.2524, 0.1363, 0.2876, 0.00356, 0.043216, 0.26622, 0.00013, 0.12455, 0.01113]

for each output neuron, then it would predict an 8. However, if we assume the correct answer is 3, then the target vector would be

t = [0, 0, 0, 0, 0, 0, 0, 0, 0]

Therefore, the error is the difference between each item in each vector.

ΔEnL = (nLt)

therefore   ΔEnL3 = 0.00356 – 1

Perceptron then passes this error E  back and multiplies it by the derivative of the sigmoid function S’. Thus at the output layer (beginning of back propagation), Δ is first defined as the following:

ΔL =ΔEnL · S'(nL)
where    S'(nL) = nL(1-nL)

Then we calculate the error through each layer that is considered to represent the recursive accumulation of change so far that contributed to the error (from the perspective of each unique neuron). We must transpose past weight values to fit the following layer of neurons.

(frac{partial E}{partial n} = delta_{L} = left [ T(w_{l+1}) *  delta_{l+1} * n_{L}(1-n_{L}) right ])

Finally, this change can be traced back to an individual weight by multiplying it by the weight’s activated input neuron value.

E/wE/n · nl-1

The change now needs to be used to adapt the weight value. Most simply, this can be done like so:

c5c2″ > w = w – (η · E/w)

The η represents the learning rate. Learning rate is a user-specified value that is usually between 0.001 and 10. It is key to determining the sensitivity of learning, and the emphasis that should be put on the response to error. If the learning rate was too high, the changes in the weight would be too dramatic in response to the error margin. This would result in a possible initial success before a complete overshoot in observation. It is a bit like assuming too much significance in what are actually small observations, resulting in disproportionate guesses.

Here is an idea of how a learning rate will  affect the performance of a neural network

 width=

If learning rate is too high, error soon increases due to overshoot in observations.

If learning rate is too low, error does decrease, but with little sensitivity.

If learning rate is near perfect, error decreases at its most efficient rate.

Perceptron Back Propagation Code

We start by obtaining the first — or last, depending on how you see it — weight layer, and adding one dimension so it complies with the structure of the target vector.

for weight_layer_count in range(len(self.all_weights)-1,-1,-1):
                        weight_neuron_vals = np.expand_dims(self.nn_neurons[weight_layer_count+1],axis=1)

We then reverse our sigmoid function by going from the activated value back to the summed value (pre-activated).                        

target_vector = np.expand_dims(target_vector,axis=1)
                        activated_to_sum_step = weight_neuron_vals * (1-weight_neuron_vals)

If it is the output layer, we use the target vector as the error margin; if not, we use the previous layer of neurons, and transpose weight values to fit with the previous layer. We now have a path from the output cost value to the pre-activated sum values.

The next iteration takes it back further, in a recursive manner, to back_prop_cost_to_sum, which is ΔL from our expression earlier.

                        if(weight_layer_count == len(self.all_weights)-1):
                                back_prop_cost_to_sum = (weight_neuron_vals - target_vector) * activated_to_sum_step
                        else:
                                trans_prev_weights = np.asarray(self.all_weights[weight_layer_count+1]).transpose()
                                back_prop_cost_to_sum = np.dot(trans_prev_weights,back_prop_cost_to_sum) * activated_to_sum_step

If biases  are involved, then we back propagate them, too. 

                        if(len(self.biases_weights[weight_layer_count])!=0):
                                current_bias_weight_vals = self.biases_weights[weight_layer_count]
                                final_bias_change = self.learning_constant * back_prop_cost_to_sum.flatten()

Take the path from the summed value back to its original inputs for individual weight adjustments.

                        self.biases_weights[weight_layer_count] = current_bias_weight_vals - final_bias_change
                        input_neuron_vals = np.expand_dims(self.nn_neurons[weight_layer_count],axis=1)
                        full_back_prop_sum_to_input = np.dot(back_prop_cost_to_sum,input_neuron_vals.transpose())

Finally, update weight values with learning rate.

                        current_weight_vals = self.all_weights[weight_layer_count]
                        new_weight_vals = current_weight_vals - (self.learning_constant * full_back_prop_sum_to_input)
                        self.all_weights[weight_layer_count] = new_weight_vals

This forward and back process happens for each row in a dataset. Once the weights have been optimised and the model becomes more successful, it is time for you to input values yourself. Watch the magic happen.

Conclusion

Perceptron is still very new, and I hope to improve the bits that need improvement over time. Please feel free to contact me with questions, ideas, or feedback.

Remember, Perceptron is for experimenting. Change code, add code, and tell me how it goes.

If you have just started to learn about neural nets, or already have a great understanding of them, I hope this has helped you understand how Perceptron and its code can complement your knowledge.

Caspar Wylie, ODSC

Caspar Wylie, ODSC

My name is Caspar Wylie, and I have been passionately computer programming for as long as I can remember. I am currently a teenager, 17, and have taught myself to write code with initial help from an employee at Google in Mountain View California, who truly motivated me. I program everyday and am always putting new ideas into perspective. I try to keep a good balance between jobs and personal projects in order to advance my research and understanding. My interest in computers started with very basic electronic engineering when I was only 6, before I then moved on to software development at the age of about 8. Since, I have experimented with many different areas of computing, from web security to computer vision.


Notice: Undefined variable: logo_content in /home/odsc1733/public_html/opendatascience.com/wp-content/plugins/mobile-menu/includes/class-wp-mobile-menu-core.php on line 310