Lecture 18 - Deep Generative Models

Introduction to Variational Autoencoders, Generative Adversarial Networks, and Diffusion Models.

Logistics Review

Deep Generative Models

Recall Generative and Discriminative models

What are Deep Generative Models?

Early Forms of DGMs:

A Probablistic nerual network that uses sigmoid activation functiona to model conditional probabilities. It uses directed edges where nodes are consisted of binary values.

Helmholtz machine has two networks as seen in the graph above. One is bottom-up that takes inputs and produces distributions over hidden layers. Another is top-down that generates values.

How DGMs are trained?

\[\log p(x) \geq \mathbb{E}_{q(z \mid x)}\left[ \log p(x \mid z) \right] - \mathrm{KL}\left( q(z \mid x) \,\|\, p(z) \right)\]

Variational Autoencoders (VAEs)

VAEs is variational inference plus autoencoders

Recall ELBO of VI where we let q(z x) be some family that’s easier to optimize:
\[\log p(x) \geq E_{z \sim q(z)}[\log p(x, z)] + H(q)\]

Also, recall autoencoder:

  1. Use encoder to compress data into smaller details.
  2. Pass through latent space.
  3. Use decoder to recreate the original input.

The idea here is simple. Autoencoder is not generative but we can make it generative by using variational inference. We can use the inference model as encoder. Pass the generated data to letent space Z, and decode the data using generative model.

Now, we want to estimate the true parameter of θ of the generative model. The question is how to represent it?

  1. We can chose a simple prior p(z) like normal distribution.
  2. Then we can train the model by maximizing the likelehood of training data: $p_{\theta}(x) = \int_{}^{}p_{\theta}(z)p_{\theta}(x z)dz$

Reparameterization Trick

To enable backpropagation through stochastic sampling, VAEs use the reparameterization trick:

\[z = \mu(x) + \sigma(x) \cdot \epsilon, \quad \epsilon \sim \mathcal{N}(0, I)\]

This reformulation allows gradients to flow through ( \mu(x) ) and ( \sigma(x) ), making the sampling operation differentiable and trainable with gradient descent.

Generating from a VAE

After training, you can generate new data as follows:

  1. Sample a latent vector ( z \sim p(z) ), often a standard Gaussian.
  2. Pass ( z ) through the decoder to get ( x’ \sim p(x z) ).

This enables the model to create novel data that resembles the training distribution.

GAN Architecture and Objective

GANs consist of:

The generator learns to fool the discriminator; the discriminator learns to detect the fakes. Training is formulated as a minimax game:

\[\min_G \max_D \; \mathbb{E}_{x \sim p_{\text{data}}}[\log D(x)] + \mathbb{E}_{z \sim p(z)}[\log(1 - D(G(z)))]\]
This adversarial setup allows GANs to learn a rich, implicit distribution over data without explicitly modeling ( p(x z) ).

Generative Adversarial Networks (GANs)

Key Points:

Slide 43: Training objective.
Slide 43: Outline of the generator-discriminator framework.
Slide 45: Example Results.

GANs and VAEs: A Unified View

Key Points:

Slide 47: GANs Revisited.

Variational EM vs GANs

Key Points:

Slide 48: GANs rewritten.
Slide 50: GANs vs VAEs.
Slide 51: Mode Covering vs Mode dropping (mode collapse).

Diffusion Models

Key Idea:

Slide 53: Diffusion Model: Backward Process.
Slide 54: Diffusion Model: Forward Process.