Supervised learning: backpropagation
What
is Learning?
To gain knowledge, understanding or skill by study, instruction, or experience (WWWebster Dictionary)
Learning
In the context of a neural network:
"Learning is a process by
which the free parameters of a neural network are adapted through a continuing
process of stimulation by the environment in which the network is embedded. The
type of learning is determined by the manner in which the parameter changes
take place."
- Haykin, 1994, “Neural Networks, A comprehensive foundation.p 50

can be viewed as:


This Hebbian network can be regarded as a subset of the IAC network.
It exhibits fast (1-shot) learning of relationships
There is a 1-way relationship between the units (not 2-way as in IAC)
Input is a distributed vector representing an individual as a unique concatenation of 19 features
Output is a local representation - 1 instance unit per person.
Feature
Vectors
In this representation, the feature vector is a binary string
|
Input |
Out |
|||||
|
Schl |
Age |
Gang |
MS |
Occ |
Nm |
Inst |
|
100 |
001 |
10 |
100 |
100 |
10000 |
10000 |
|
001 |
010 |
01 |
001 |
001 |
01000 |
01000 |
|
010 |
100 |
10 |
100 |
010 |
00100 |
00100 |
|
100 |
010 |
10 |
100 |
100 |
00010 |
00010 |
|
100 |
100 |
10 |
010 |
001 |
00001 |
00001 |
Calculate a similarity matrix for the vectors:
|
|
Art |
Rick |
Sam |
Ralph |
Lance |
|
Art |
|
|
|
|
|
|
Rick |
|
|
|
|
|
|
Sam |
|
|
|
|
|
|
Ralph |
|
|
|
|
|
|
Lance |
|
|
|
|
|
Correlation
1110001010
1110000010
1
-1 1 -1 1 -1
1
1 -1-1 1 1
1100011111
0011100000
But the input vectors could consist of real numbers instead of 1s and 0s
eg. Occupation
|
Pusher |
Bookie |
Burglar |
|
1 |
0 |
0 |
|
0.6 |
0.3 |
0.2 |
How to generalize the similarity matrix?
Orthogonal
Vectors
The dot product is the sum of the products of corresponding elements of two vectors.
Eg v = 1, 1,-1,-1
w = 1,-1, 1,-1
v.w = (1) + (-1) + (-1) + (1) = 0
The dot product is a measure of the similarity of the two vectors.
If the dot product is 0, the vectors are said to be uncorrelated or orthogonal.

Difference
Matrix
The vectors:
|
100 |
001 |
10 |
100 |
100 |
10000 |
10000 |
|
001 |
010 |
01 |
001 |
001 |
01000 |
01000 |
|
010 |
100 |
10 |
100 |
010 |
00100 |
00100 |
|
100 |
010 |
10 |
100 |
100 |
00010 |
00010 |
|
100 |
100 |
10 |
010 |
001 |
00001 |
00001 |
Difference Matrix:
|
|
Art |
Rick |
Sam |
Ralph |
Lance |
|
Art |
|
|
|
|
|
|
Rick |
|
|
|
|
|
|
Sam |
|
|
|
|
|
|
Ralph |
|
|
|
|
|
|
Lance |
|
|
|
|
|
How to generalize the difference matrix?
Euclidean
Distance

The
Hebb Rule
When two cells fire at the same time the strength of the connection between them should be increased
In its simplest form:

"the change in the weight to unit i from unit j is equal to the learning rate, multiplied by the activation of unit i, multiplied by the activation of unit j"
With linear units

The strength of the weights will be proportional to the activations of the two units.
Properties
of a Hebbian Synapse:
Neurobiological
Considerations
A time-dependent, highly local and strongly interactive mechanism appears to be responsible for one form of long-term potentiation in the hippocampus, which plays a key role in certain aspects of learning and memory.
Hebbian learning appears to be biologically plausible (but it's not as simple as that!)
One
Shot Learning



Superposition
of Patterns
The set of weights after training with multiple patterns is simply the sum of the sets of weights resulting from training with each pattern.
So the output of the network depends on the patterns seen during training:
Implications
Positive:
Negative:
The
Delta Rule
"delta" (D ) means "a small change or difference"
The delta rule aims to adjust the weights so that the difference between the actual output and the target output is minimized.

where

The change in the weight to unit i from unit j is equal to the learning rate multiplied by the difference between the target and actual outputs, multiplied by the activation of unit j.
Implications
Positive
Negative
The
Lab
1. Hebbian Learning
2. Delta Learning