Contrastive Divergence (CD) is an algorithm used for training generative models, particularly in the context of Boltzmann Machines (BMs) and Restricted Boltzmann Machines (RBMs).
CD is a practical approximation of the more computationally intensive Markov Chain Monte Carlo (MCMC) methods, making it feasible to train BMs effectively.
The primary objective of CD is to adjust the weights of the BM to maximize the likelihood of generating observed data samples. The training process involves two main steps: positive phase and negative phase.
In the positive phase, the BM receives a training sample and activates its neurons to reproduce the observed data pattern. The activations of the visible and hidden neurons are updated according to the current weights and biases. This step captures the joint distribution of the observed data and the hidden representations.
The negative phase, also known as the sampling phase, is where the model generates synthetic samples. Starting from the activations obtained in the positive phase, a Markov Chain process is initiated by repeatedly sampling the states of the visible and hidden neurons. As the Markov Chain evolves, the model generates new samples by repeatedly updating the neuron states based on the current weights.
The key idea behind CD is to approximate the difference in statistics between the positive and negative phases. Instead of running the Markov Chain until it reaches convergence, CD typically performs a few steps of sampling. This approximation reduces computational costs and makes the training process more efficient. By comparing the statistics of the observed data and the generated samples, CD computes the gradient of the model’s parameters and updates the weights accordingly.