A perceptron is a fundamental unit of the neural network which takes weighted inputs, processes it and is capable of performing binary classifications. This concept is a simplified model of a biological neuron. It was developed by Frank Rosenblatt in the late 1950s.
Here’s how a single perceptron works:
- Inputs: Each perceptron receives multiple inputs, analogous to the dendrites in a biological neuron. These inputs can be initial data points, or they can be the outputs from other neurons in the network.
- Weights: Each input also has a weight which is assigned based on its importance. The weight amplifies or diminishes the value of the input.
- Summation: The weighted inputs are then summed together. This can be thought of as a weighted sum.
- Activation: The summed value is then passed through an activation function, which transforms the input in some way. This is analogous to the action potential, or firing, of a biological neuron. The simplest activation function for a perceptron is a step function, which allows the perceptron to output a binary value (typically 1 or 0). If the summed value is above a certain threshold, the perceptron outputs a 1, otherwise it outputs a 0.
- Output: The output of the activation function is the output of the perceptron. This can either be a final output of the model or can serve as input to another perceptron.
In a neural network, perceptrons are arranged in layers, with the outputs of one layer serving as the inputs to the next layer. The first layer (closest to the initial data inputs) is called the input layer, the last layer (closest to the final output) is the output layer, and any layers in between are called hidden layers.
The weights in a perceptron network are initially set to random values, and then they are iteratively updated as the model learns from data. The model’s goal is to find the best set of weights to map the inputs to the correct outputs. This learning process involves a method called backpropagation, in combination with an optimization algorithm such as stochastic gradient descent.
It’s important to note that while a single-layer perceptron can only learn linearly separable patterns, multi-layer perceptrons (also known as feedforward neural networks) with enough hidden layers and neurons can theoretically model any function. This is known as the universal approximation theorem.
PS: You can find The perceptron: A probabilistic model for information storage and organization in the brain paper, Rosenblatt, F. (1958), here. (Needs purchase or American Psychological Association-affiliated institution login creds.)