Denoising Diffusion Probabilistic Model

Posted on February 29, 2024   3 minute read ∼ Filed in  : 

Introduction

Background

Deep generative models can generate high quality images/audio samples.

  • Generative Adversarial Networks
  • Autoregressive Models
  • Flows
  • Variational Autoencoders.

??? Energy-based modeling and score matching

diffusion probabilistic models.

  1. A diffusion model is essentially a generative model that learns to create data similar to what it has been trained on. It does this through a specialized process involving a series of steps (a Markov chain) that gradually transforms random noise into structured data samples.

  2. Training Phase

    1. Initialization :

      • Start with a dataset of original samples (e.g., images).

      • Define the forward diffusion process as a sequence of steps that gradually add Gaussian noise to these samples, transforming them into pure noise. This process is a Markov chain because each step’s outcome depends only on the previous step.

    2. Forward Diffusion Process (Markov Chain) :
      • Sequentially add Gaussian noise to each data sample over a predefined number of steps, resulting in a sequence of increasingly noisy versions of the original data. This sequence forms a Markov chain, ending with a sample that is essentially random noise.
    3. Learning the Reverse Process (VI and Markov Chain) :

      • The reverse process aims to learn the reverse of the forward diffusion, which involves another Markov chain where each step attempts to remove the noise added in the corresponding forward step.

      • Variational Inference : The model parameters for the reverse process are learned by minimizing a variational bound. This involves comparing the noisy distribution at each reverse step with the corresponding forward process’s distribution, effectively using VI to approximate the complex distribution of the reverse process.

      • Objective Function : The loss function incorporates elements that ensure fidelity to the data and adherence to the correct noise levels at each step, as dictated by the forward process. This might include terms for reconstruction loss and terms that enforce the learned distribution to match the forward process’s added noise distribution.

  3. Inference Phase

    1. Starting with Noise :
      • Begin with a sample of pure noise, resembling the final output of the forward diffusion process.
    2. Applying the Reverse Process (Markov Chain) :

      • Sequentially apply the learned reverse process to the noise sample. At each step, the model uses the learned parameters to slightly denoise the sample, moving it closer to resembling original data.

      • This reverse application is a Markov chain because each denoising step depends only on the state from the previous step.

    3. Generation of New Samples :

      • Continue applying the reverse process until the noise is fully reversed, resulting in a new sample that resembles the training data in distribution.

      • The generated sample is a new, previously unseen data point synthesized by reversing the diffusion process.

Gap

NO exsiting work shows that the Diffusion Probabilistic Model can generate high quality samples

Goal

This paper present high quality image synthesis results using diffusion probabilistic models.

The paper illustrate that certain parameterization of diffusion models has the same effects as two other techniques.

  • Denoising Score Matching: A technique used during training to learn the underlying data distribution by comparing original data samples with their noisy versions.
  • Annealed Langevin Dynamics: A sampling technique that introduces noise into the data sampling process and gradually reduces it to hone in on high-probability areas of the data distribution.

This provids a theoretical foundation for why diffusion models work as well as they do.

Background

Diffusion Models: latent variable models.

During training, we want to minimize the negative log likelihood ( Eq 3 ). This is known as the Evidence Lower Bound (ELBO) in variational inference, and it is a surrogate for the true negative log-likelihood we want to minimize.

Why 1-beta in equation 2: beta*I : noisy is indenpendent across various dimensions.





END OF POST




Tags Cloud


Categories Cloud




It's the niceties that make the difference fate gives us the hand, and we play the cards.