Artificial Intelligence 17 min read

Probability Basics, Discriminative vs Generative Models, and Autoencoders (including Variational Autoencoders)

This article introduces fundamental probability notation, explains the difference between discriminative and generative models, and provides a comprehensive overview of autoencoders and variational autoencoders, covering their architectures, loss functions, latent spaces, and practical applications in image manipulation.

Rare Earth Juejin Tech Community

Dec 3, 2023

Probability Basics, Discriminative vs Generative Models, and Autoencoders (including Variational Autoencoders)

Background Knowledge

Probability Notation

P(x) = probability that event x occurs.

Joint probability: P(x, y) or P(x \cap y) = probability that both x and y occur.

Conditional probability: P(x | y) = probability that x occurs given that y has occurred.

(This formula shows how conditional probability relates to joint probability.)

Probability Distribution

Describes the probabilities of all possible values of a random variable.

Represented as a function p(x).

Assigns a non‑negative value to each possible x, indicating its likelihood.

The sum (or integral for continuous variables) equals 1 (normalisation).

Example

An image shows the probability distribution of letters a‑z in English text.

The chart reflects the relative frequency of each letter in a corpus.

All probabilities sum to 1.

Probability Density Function (PDF)

Describes the distribution of a continuous random variable.

Represented as a function p(x).

Assigns a non‑negative density to each possible x.

The total area under the curve (integral) equals 1; the probability of a range is the integral over that range.

Example

An image shows the PDF of weekly income in Australia.

The chart visualises the density of each income level and confirms normalisation.

Discriminative vs Generative Models

Discriminative Models

Learn the relationship between input data and output labels, i.e., p(y|x).

Typical examples: logistic regression, support vector machines, neural networks.

Generative Models

Generative models can produce new samples from the learned distribution.

Conditional Generative Model

Input: a label (e.g., "frog").

Output: probability distribution p(x|y) over images.

Example: generate images that correspond to the given label.

Unconditional Generative Model

Output: probability distribution p(x) over images.

Generates data without any conditioning label.

Example: trained on a mixed animal dataset, it can produce a random animal image.

Bayes' Rule

This rule describes the relationship between discriminative and generative models.

Comparison

Discriminative Models Goal: learn conditional probability p(y|x). Directly model the decision boundary between classes.

Generative Models Goal: learn joint probability p(x, y). Capture how data are generated, enabling sampling and reverse inference p(x|y).

What makes one set of pixels more probable than another? Typically, the frequency of that pixel pattern in the training data and the contextual structure of images.

Autoencoders

Unsupervised Learning

Learn a model from unlabeled data.

Goal: find a compact representation of the data, often with fewer parameters.

Applications:

Provide a simplified model for another ML system.

Dimensionality reduction.

Potentially improve generalisation.

Autoencoder Architecture

Basic architecture

Deep architecture

Neural network used for unsupervised learning.

Sometimes called “self‑supervised” learning.

Output reproduces the input (e.g., an image).

Hidden layer learns a low‑dimensional representation.

Structure

Encoder compresses the input into a hidden (bottleneck) layer.

Hidden layer has fewer neurons than the input layer.

Decoder reconstructs the input from the hidden representation.

Often the encoder and decoder share weights.

Goal: make the output as close as possible to the original input.

Hidden Layer

“Bottleneck” layer smaller than the input.

Represents the input with latent variables.

With a linear activation and one hidden layer, it learns PCA.

Why must the hidden layer be smaller than the input? A smaller hidden layer forces the network to capture the intrinsic structure of the data rather than merely copying it, enabling dimensionality reduction and more informative features.

Output and Loss

The output has the same type as the input (e.g., an image) rather than a class label.

For binary images, use tanh or sigmoid activation.

For natural images, use linear activation.

Loss = difference between input and output (e.g., Mean Squared Error).

Latent Representation

The encoder maps the input to a low‑dimensional latent space that captures the main features and structure of the data.

Variational Autoencoder (VAE)

A probabilistic version of the autoencoder that learns a latent representation and can sample from the model to generate new images.

Assume images are generated from latent variables z following some distribution.

Typical prior p(z) is a standard normal distribution.

Probabilistic Encoder

Input: image x → outputs mean μ z|x and diagonal covariance Σ z|x , defining a Gaussian q(z|x).

Probabilistic Decoder

Input: latent variable z → outputs mean μ x|z and diagonal covariance Σ x|z , defining a Gaussian p(x|z).

Goal: maximise the likelihood p(x) using a variational lower bound.

VAE Architecture

Loss Function

Goal: maximise p(x). The loss is based on the variational lower bound and consists of two terms:

Reconstruction loss – encourages the decoder to produce outputs as similar as possible to the inputs.

Regularisation loss – encourages the latent distribution q(z|x) to be close to the prior p(z) (usually a standard normal) using the Kullback‑Leibler divergence.

Reconstruction Loss

Measures how well the decoder can reconstruct the input (e.g., log‑likelihood of x given the reconstruction).

Regularisation Loss

Measures the KL‑divergence between q(z|x) and the prior p(z), encouraging a meaningful latent space.

Latent Space Properties

Continuous : nearby points generate similar images.

Complete : every point corresponds to a valid image.

A standard normal distribution satisfies both properties, and a diagonal covariance ensures independence of latent variables.

Applications – Image Manipulation

By modifying the latent vector z of an image x, one can create variations of x; however, latent directions may not correspond to interpretable attributes without additional constraints.

VAE Pros and Cons

Pros

Learns approximations of p(z|x) and p(x|z), enabling generation of new instances.

Provides a probabilistic framework for sampling.

Cons

Generated images are often blurry because the model averages over many possible outputs.

Summary

Discriminative models predict labels from images; generative models predict the probability distribution of images.

Autoencoders assume images can be generated from a low‑dimensional latent space.

Regular autoencoders learn a latent representation for reconstruction.

Variational autoencoders are a probabilistic version that allows sampling from the latent space.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

machine learning probability Generative Models Latent Space Variational Autoencoder autoencoders Discriminative Models

Written by

Rare Earth Juejin Tech Community

Juejin, a tech community that helps developers grow.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.