How to train single neuron? How does methodology of learning look? What is DELTA rule? How to check training efficiency? What is ADALINE model?
Philosophy of machine learning
British philosopher John Locke (do not confuse with character from popular TV series “Lost”) said that every man borns as “tabula rasa” (blank slate), without any built-in mental content what means that every mind needs experience during life to gain knowledge. If we consider his theorem we will find it’s truth. What is more, after 300 years we can easily make analogy to artificial neural networks, and say that they operate under the same rules. Artificial neurons after creating are not adapted to any tasks and they also have to be trained.
The main idea is to show them repeatedly data sets and make them more "sensitive" for these data which we would classify as truth. Interesting is that after training, neural network is going to recognize also these data sets which it have never seen - but there is a condition - these data sets need look similar to training data. Sometimes it’s helpful, but sometimes not, so we will continue learning process using single batch for many iterations until we reach satisfying efficiency level.
Learning in progress..
We know main idea, but how does this “mechanism of learning” work? To be honest - it’s all coming down to synapses weights
w tuning, exactly like in biological process. Every synapse is treaten as individual tuner for every input, what allows us to modulating signals which flow through single neuron.
How does tuning look? We’re going to describe it, but first let’s tell about types of machine learning.
We distinguish the following types of training:
supervised learning - learning with a teacher (human) - it lets neuron for very precisely classification. For this type of training we must have two pieces of information which will be shown to our network - we need to know what we’d like set as inputs, and what result we would to receive as output. This willingness of receive specific result is called “prediction” (also known as “desired output”). If neuron has these informations, it will find all of dependencies between them, and will remember for the future classification.
unsupervised learning - learning without a teacher - this type of training uses so-called “cluster analysis”. It is slower than classical supervised learning (needs more iterations), but its undeniable advantage is that there’s no need to know what we would receive as output. After some time of training, without human interference neurons will "automatically" learn that there are data which are similar to the others, and there are data which are much different - that will be base for future classification.
In this article we consider only supervised learning case. Of course the most trivial method is manually setting of the synapses weights and watching what will we receive as output, but that’s not exactly what we would like to deal with (cause it’s boring!). There is a method which is much more better but still enough simple. Ladies and Gentlemen, I present to you DELTA rule.
DELTA rule for single input and for his synapse could be written as short equation:
wj|i+1 = wj|i + η(z - y)xi
wj|i - synapse weight, number
wj|i+1 - weight of the same synapse in next iteration
xj - input, number
z - desired output (prediction)
y - actual output
η - learning rate
j- input number or weight number
i - iteration number
Of course we apply rule for every
For first iteration we should set weights as very small values, different from each other. This asymmetry doesn’t matter for single neuron, but it will do if we create more complex, multilayer network.
In equation appears learning rate
η, which is a value chosen by human and which is instruction “how big step should learning function do, for every update of weights”. Process of learning could be compared to tuning the radio from early 90’s. This old type of radio has a knob, which we can turn to find frequency which is used by our favourite radio station. Now imagine that every turn of knob is iteration, and angle of turn is learning rate (learning step). If we change knob position too abruptly it could make result of searching really divergent from our expectations. Opposite situation is when we make too small, “shy” steps - then our tuning will take a long time and there is possibility that it never ends.
We are exactly in the same situation if we set too high or too low learning rate. That’s why optimal value for learning rate is so important - it provides satisfying algorithm convergence in enough short time.
Learning rate could be constant for whole time of training process, but there are also more complex algorithms where its gradually changed (so-called adaptive learning rate adjustment methods). In our case we will treat learning rate as a small number close to 0 (e.g. 0.07 or 0.2 or 0.4 etc.)
DELTA rule is iterative algorithm, what means that every step (iteration) updates synapses weights, what approximating actual output
y to desired output
In theory we would to receive actual output
y equal to desired output
z. In mathematical context it means that we would like to minimize error function.
E = ½ Σj = 0 (zj - yj)2 -> min
E - error function
zj - desired output for
yj - output for single data set, number
Error function is some kind of quality indicator. It determines how large averaged square error occurs after training between desired and actual outputs for all data sets. Of course we want it to be as small as is possible.
DELTA rule is rather simple method, but this is basis and starting point for more complex learning algorithms. It’s worth noting that after applying this learning rule, our neuron evolves from perceptron into ADALINE model (ADAptive LINear Element). In following articles I’m going to show how this simple model could be implement for hobby purposes (e.g. character recognition).
Thanks for your attention!
I confirm that I have used Google advanced image search with usage rights: "free to use, share or modify, even commercially"
- R. Tadeusiewicz "Sieci neuronowe" Akademicka Oficyna Wydaw. RM, Warszawa 1993. Seria: Problemy Współczesnej Nauki i Techniki. Informatyka.