Sampling in Stable Diffusion
When using AUTOMATIC1111, you would see many sampling options: K-LMs, DDIM, Euler, etc. Which one should you use? What is a Sampler? How does it work?
Today we are going through an overview of Samplers, and how to use it to improve your output images.
Table of Contents
What is Sampling?
Sampling is a specific technique used to create an image in Stable Diffusion.
To generate an image, the Stable Diffusion initially produces a completely random image in the latent space. Then the noise predictor will estimate the noise of the picture. The predicted noise will be subtracted from the picture, and this process will be repeated multiple times, resulting in a refined, noise-free picture. Such a denoising process is called sampling.
Since Stable Diffusion generates new sample pictures in each step, the method employed in sampling is called the sampler or sampling method.
The sampling process gradually outputs cleaner and clearer images.
Even though the framework remains the same, there are many different approaches to implementing the denoising process. It is sometimes a trade-off between speed and accuracy.
If you want to jump through the slightly complicated content. The general samplers using recommendations are listed below:
- If you are looking for a fast and effective sampling method that can also provide reasonable image quality in a short period, then some excellent options would be either DPM++ SDE or DPM++ SDE (Karras).
- DPM++ 2M Karras with 20~30 steps
- UniPC with 20~30 steps
Both these algorithms provide a good balance of speed and quality, with the ability to converge accurately in a limited number of sampling steps. However, there is always a tradeoff between performance and quality. Higher quality settings can be achieved with UniPC and other sampling methods, but those may come with increased run times.
- If quality in terms of final image output is more important to you than convergence, or if you have a sufficiently long time to wait for the sampling process, then good options for sampling are:
- DPM++ SDE Karras with 10-15 steps (Note: This is a slower sampler)
- DDIM with 10-15 steps.
- If you prefer more stable, consistent results, it is best to avoid the use of ancestral samplers, which are more likely to produce random and unpredictable results and vary the image output significantly between every step.
- For simple samplers, Euler and Heun are excellent choices that are straightforward to use and provide a good balance of stability and reproducibility with good speed. It is possible to reduce the number of sampling steps for Heun to save time, which can be useful in cases where the number of available steps is limited.
The Denoising process
The process of denoising images is simple but interesting. There is a fact called the noise schedule which affects the the whole process.
The noise schedule controls the noise level at each sampling step. Noise is initially high, but gradually disappears the more sampling steps you perform. In the end, the noise is close to zero and the image you get is nice and clear.
Each step of the sampling process produces an output image that needs to match the noise level specified in the noise schedule. The schedule is what guides the sampling and noise reduction process, ensuring that noise gradually reduces with each step.
You may wonder, what if we increase the number of sampling steps? What will happen? The answer is that the noise reduction between each step would get smaller, this way can help to reduce the truncation error of the sampling.
Advantages of Sampling
- More efficient
Using sampling algorithms allows you to explore parameter combinations more efficiently, allowing faster convergence to the desired distribution.
By reducing the variance of the samples, sampling algorithms can produce estimates of the target distribution that are more precise, creating images with reduced noise and variation.
- React swiftly
Sampling algorithms possess a high degree of robustness, meaning they can adapt quickly to changes in the distribution over time. This is particularly useful when the distribution is constantly evolving, as the algorithm can react in real-time and adjust to the latest changes.
- Convenient
In cases where computational resources are limited, it's useful to have sampling methods that are less intensive in terms of memory and relatively easy to implement. This can significantly improve image generation processes when limited resources are available, making the image creation process more efficient and cost-effective.
Samplers Overview
There are a bunch of samplers available in Stable Diffusion. For better understanding, we divided those samplers into several categories.
Ancestral Samplers
Ancestral samplers add noise to the image at each step during the sampling.
An ancestral sampler can be tricky to use because it does not always guarantee convergence. To illustrate this issue, let's look at the images generated by Euler a and Euler below.
Even though the two have similar initial conditions, the images generated are quite different. This shows that the result of sampling using different ancestral samplers can vary drastically, making convergence far from assured.
The Euler a do not converge at high sampling steps when generating images. However, images generated from Euler converge well.
The ability for an image to converge is essential for reproducibility. To achieve slight variations between images, using a variational seed can work well.
Besides Euler a, DPM2 a, DPM++ 2S a, DPM++ 2S a Karras are all ancestral samplers.
ODE
Some of the samplers on the list have a long history, being invented more than a hundred years ago. They are considered old-school methods for solving ordinary differential equations (ODE).
These methods are based on traditional techniques for ODEs, making them less versatile than what's available today.
Euler
The first one is Euler, the simplest possible solver.
Euler is a very simple sampler to understand. It is identical to the mathematical technique of Euler's method for solving ordinary differential equations. Since it's entirely deterministic, no random noise is added during sampling. This means that it's entirely predictable and provides the same result each time.
Heun
Heun's method, a more accurate improvement to Euler's method, uses a different algorithm for the solution of differential equations. However, its requirement for prediction of noise twice in every step makes it twice as slow as Euler.
LMS
Just like Euler's method, the linear multistep method (LMS) is a standard approach for solving differential equations. It aims at enhancing accuracy through the clever use of information from previous time steps. AUTOMATIC1111 is set to utilize up to 4 previous values by default.
Karras
The samplers labeled "Karras" utilize the noise schedule recommended in the Karras article. If you look closely, you'll notice that the noise step sizes are smaller toward the end of the schedule. According to the Karrass article, this approach can improve the quality of generated images.
DPM and DPM++
DPM (Diffusion Probability Model Solver) and DPM++ are a new family of samplers designed for diffusion models that were released in 2022. They share a similar architecture and represent the same family of solvers that work to solve the Diffusion process.
DPM and DPM2 are similar except DPM2 is a second-order solver, which makes it more accurate but slower.
DPM++ is an improvement over DPM, offering a higher accuracy without sacrificing the speed of the calculations.
DDIM and PLMS
DDIM (Denoising Diffusion Implicit Model) and PLMS (Pseudo Linear Multi-Step Method) are two samplers that were shipped with the original Stable Diffusion v1. DDIM, one of the early samplers for diffusion models, was initially released and has since become outdated and no longer widely used. PLMS is a newer and faster alternative to DDIM, designed to improve the speed of the diffusion process.
UniPC
UniPC (Unified Predictor-Corrector) is a sampler released in 2023 that is based on the predictive-corrector method used in ODE solvers. It employs a combination of predictor and corrector strategies to achieve a higher level of accuracy in image generation and can achieve high-quality images within just 5-10 steps.
K-diffusion
If you've come across the term k-diffusion, you might have wondered what it means. It refers to Katherine Crowson's k-diffusion GitHub repository and the samplers associated with it.
This repository is based on the samplers from the Karras et al. 2022 paper, implementing those samplers in a convenient GitHub repository.
While AUTOMATIC1111 includes more samplers besides k-diffusion, DDIM, PLMS, and UniPC, all the other samplers are derived from k-diffusion.
Samplers Comparison
So how to choose a sampler, the bellowing figures show each sampler's performance on image convergence, speed, and perceptual quality, hope it can help you to make the ideal choice.
Image Convergence
In the "Evaluating samplers" part of the article, the author generates the same image using different samplers with up to 40 sampling steps.
In this case, the final image generated at the 40th step is used as the reference that we are measuring the convergence speed against. The Euler method is used as the baseline reference point for comparison.
Euler, DDIM, PLMS, LMS (Karras), and Heun are classified as a group since they are all ODE solvers or diffusion solvers. Even though DDIM converges at the same number of steps as Euler, the variation is slightly higher. This is because it injects random noise during its sampling steps, which can make the outcome slightly inconsistent.
The PLMS did not perform well in this test. LMS Karras appears to have some difficulty converging and has stabilized at a much higher baseline, suggesting less accuracy at a lower sampling step count.
Heun does have a faster convergence and is more precise than Euler, but the tradeoff is that it is twice as slow because it is a second-order solver. So to make an accurate comparison, we should compare Heun at 30 steps with Euler at 15 steps and take that into account.
For DPM and DPM2, the DPM fast did not converge very well, despite higher sampling speed but with more variation.
DPM2 and DPM2 (Karras) did provide better convergence compared to Euler, but also had the drawbacks of being twice as slow.
On the other hand, DPM Adaptive performs quite well and can generate accurate images with an adaptive sampling step method, however, it does take considerably longer due to its flexibility and speed.
UniPC is a relatively effective sampling method with convergence similar to that of Euler, though it is a bit slower. This minor drawback in speed is negligible, considering the improved convergence and the ability to maintain fairly fast run times overall. In comparison to most other existing sampling methods, UniPC is a fast, efficient, and precise option.
Speed & Quality
Except for the slowest DPM adaptive, other rendering times fall into two groups, those taking about the same amount of time and those taking twice as long.
This is due to the group of samplers using second-order diffusion solvers, which need to evaluate the denoising U-Net twice, costing it extra time and processing power.
Now, let's observe the images produced by each sampler. We kept all generation parameters constant except for the sampler.
We can see that DPM++ failed. When comparing ancestral and deterministic samplers, the ancestral samplers appear to result in kitten-like images, while deterministic samplers seem to yield images that are closer to a full-grown cat. However, there is no inherently correct answer, and either result can be acceptable as long as they appear visually pleasing.
Perceptual Quality
Even if a generated image has not yet fully converged, it is still possible to obtain a high-quality result. Let's explore how quickly each sampling algorithm can produce acceptable levels of image quality.
The BRISQUE (Blind/Reference Image Spatial Quality Evaluator) is used to measure the perceptual quality of generated images. It is capable of evaluating the quality of natural images independently of any references or prior knowledge.
As the chart shows, DDIM can produce the highest quality image of the given options with a low sampling step count of just 8. While there are a couple of exceptions, the ancestral samplers perform similarly to Euler in terms of generating high-quality images.
The DPM2 and DPM2 (Karras) samplers perform slightly better than Euler in terms of image quality at all sampling step counts.
According to the results of this sample set, DPM++ SDE and DPM++ SDE (Karras) have demonstrated the highest performance in terms of image quality.
On the other hand, UniPC seems to have slightly worse performance than Euler in the initial sampling steps, but it can reach similar levels of convergence to Euler after a higher number of samples.