Neural Networks
- Processing Elements
- An array of inputs, with a weight associated with each input.
- An intermediate value computed from the input values and
weights.
The computation performed depends on the architecture of the
network, often the sum of products.
- An output which is some (activation) function of the
intermediate value.
Often sigmoid, or step, or sign function.
- Inputs come either from other PEs, or from an external source.
- Layers or slabs
- The PEs are arranged into layers, or slabs.
- The input layer accepts input from an external source
- The output layer produces output for external consumption
- Hidden layers lie between the input and output layers
- There may or may not be a spatial relationship between the
PEs in a layer.
- If each PE in a layer has an input from every PE in the
previous layer, then the network is fully feedforward connected.
- Connections between non-adjacent layers may be used.
- Learning
- Supervised learning architectures
- Learn functions from input to output, based on example
input and output vector pairs.
- Once trained can approximate the function for inputs not
previously encountered.
- Unsupervised learning architectures
- Learn to differentiate between different input vectors.
- Kohonen/Counter-Propagation Networks
- There is an input layer, a hidden layer called the Kohonen
layer, and an output layer called the Grossberg layer.
- The network is fully feedforward connected.
- Training takes place in two stages.
- Firstly the Kohonen layer is trained in an unsupervised
manner.
- This trains the PEs in the layer to differentiate
between different input vectors.
- The second phase trains the Grossberg layer in a supervised
manner.
- This trains the Grossberg layer to associate an output
vector with each recognised input vector.
- Once trained, the network will output an appropriate output
vector for any given input vector.
- Kohonen layer
- The PEs in the Kohonen layer may have a spatial
relationship, e.g. rectangular lattice, triangular
lattice, hexagonal lattice.
- Intermediate value =
sqrt(SUMinputs(Weight - Input)2)
This is the Euclidian distance from the weight vector to
the input vector.
- Activation function is
f(Intermediate) = 1 if Intermediate is minimum over PEs,
f(Intermediate) = 0 otherwise
This is a form of step function.
- The weights are initially set randomly.
- Training the Kohonen Layer
- A sequence of typical input vectors are handed to the
input layer, which distributes these values to the
Kohonen layer PEs.
- Each Kohonen layer PE works out the intermediate value
(distance) from its weights to the input vector.
- The activation function determines which PE which is
closest to the input, and is declared the "winner".
- The winning PE (and those close by in the spatial
relationship if any) have their weights moved towards the
input vector:
weightkohonen,winner#,input# +=
alpha*(inputkohonen,winner#,input# -
weightkohonen,winner#,input#)
where alpha is the learning rate for winners.
- The other PEs (losers) have their weights moved towards
the input vector:
weightkohonen,loser#,input# +=
beta*(inputkohonen,loser#,input# -
weightkohonen,loser#,input#)
where beta is the learning rate for losers.
Beta is typically very small, often 0.
- After sufficient training, the PE's weights will be
distributed about the input space, with higher density
in the areas of frequent input.
Each PE recognises a piece of the input space.
- The Grossberg layer
- Intermediate value = SUMinputs(Weight * Input).
This is the sum of products.
- Activation function is f(Intermediate) = Intermediate
- The weights are initially set randomly.
- Training the Grossberg layer
- A sequence of typical input vectors and expected output
vectors are used.
- The input values are distributed to the Kohonen layer as
usual, and the winner calulated.
- The output values from the Kohonen layer are transmitted
to the Grossberg layer.
- The output from each Grossberg layer PE is the weight
corresponding to the input value 1 from the Kohonen layer.
- The weights in Grossberg layer PEs are then modified:
weightgrossberg,pe#,input# +=
inputgrossberg,pe#,input# *
(gamma*(expected_outputpe# -
outputgrossberg,pe#,input#))
where gamma is the learning rate for the Grossberg layer.
- After sufficient training, the vector made from the
Grossberg layer PE's outputs will approximate the output
vector for the piece of input space recognised by the
ith Kohonen layer PE.
- Example application: Weather types and intelligent reactions
- Backpropagation
- There is an input layer, a hidden layer and an output layer.
- Intermediate value = SUMinputs(Weight * Input).
This is the sum of products.
- Activation function is the sigmoid:
f(x) = 1/(1 + exp(-x))
- The network may or may not be fully feedforward connected.
- The weights are initially set randomly.
- The network is trained in a supervised manner, to learn a
function from input vectors to output vectors (the output
vectors are formed from the outputs from the output layer PEs).
This is done by adjusting the weights in the network.
- Once trained, the network is used by presenting an input
vector.
The vector formed from the output PE output forms the output
from the function learned.
- Training
- A sequence of typical input and output vector pairs are
presented to the network, and the PE outputs calculated.
- The output layer weights are then updated:
deltaoutput,pe =
f'(Intermediateoutput,pe) *
(expected_outputoutput,pe -
outputoutput,pe)
weightoutput,pe,i +=
alpha*deltaoutput,pe*inputoutput,pe,i
where the first derivative of the sigmoid function is
used to train the network:
f'(x) = f(x)*(1-f(x)), and
alpha is the learning rate for the output layer.
- The hidden layer weights are then updated, by propagating
the errors from the output layer back to the hidden
layer:
deltahidden,pe =
f'(Intermediatehidden,pe) *
SUMi(deltaoutput,i*weightoutput,pe,i)
weighthidden,pe,i +=
beta*deltahidden,pe*inputhidden,i
where beta is the learning rate for the hidden layer.
- Example application: Stock market analysis
Exam Style Questions
- Describe the structure and operation of a processing element.
- Explain what is meant by a neural network being "fully feedforward
connected".
- Explain the difference between supervised and unsupervised learning in
a neural network.
- Describe the architecture of a Kohonen/counter propagation network.
What (in general terms) are the tasks of each of the layers?
- Give details of how the output and hidden layers of a backpropogation
network are trained.
After sufficient training, what has a backpropogation network learned?