DDIM: Denoising Diffusion Implicit Models
Paper: Song, J., Meng, C., & Ermon, S. (2020). Denoising Diffusion Implicit Models. ICLR 2021.
Key Innovation: DDPM’s main weakness is slow sampling (~1000 neural network evaluations per image). DDIM solves this with deterministic sampling and step skipping, achieving 20-50x speedup with minimal quality loss.
The Problem with DDPM
DDPM sampling requires iterating through all T timesteps:
x_T → x_{T-1} → x_{T-2} → ... → x_1 → x_0
1000 steps = 1000 neural network forward passes = slowFor a U-Net taking 50ms per forward pass:
- 1000 steps × 50ms = 50 seconds per image 🐌
This is impractical for real-world applications.
DDIM’s Breakthrough
DDIM allows skipping timesteps while maintaining quality:
x_T → x_{900} → x_{800} → ... → x_{100} → x_0
50 steps = 50 neural network forward passes = 20x fasterSame 50ms U-Net:
- 50 steps × 50ms = 2.5 seconds per image ⚡
Critical insight: No retraining needed! Use your existing DDPM model with DDIM sampling.
Core Idea: Deterministic vs Stochastic
DDPM (stochastic):
- Adds random noise at each step
- Different samples even with same starting noise
- Must follow all timesteps sequentially
DDIM (deterministic):
- No random noise added (optional parameter
eta) - Same starting noise → same output (reproducible)
- Can skip timesteps freely
The DDIM Update Formula
Instead of the DDPM update, DDIM uses:
Breaking this down:
-
Predict :
- Use noise prediction to estimate the clean image
-
Scale predicted :
- Scale by appropriate alpha value for target timestep
-
Add noise component:
- Add the right amount of noise for target timestep
Key insight: This formula works for any , enabling arbitrary step skipping!
Implementation
@torch.no_grad()
def sample_ddim(model, shape, noise_schedule, steps=50, eta=0.0, device='cuda'):
"""
Fast sampling with DDIM
Args:
model: trained diffusion model (same as DDPM!)
shape: (batch_size, channels, height, width)
steps: number of sampling steps (e.g., 50 instead of 1000)
eta: stochasticity parameter
eta=0 → fully deterministic (recommended)
eta=1 → same as DDPM (slow)
noise_schedule: NoiseSchedule object
Returns:
generated_images: (B, C, H, W)
"""
model.eval()
T = noise_schedule.T
# Create timestep schedule (skip steps)
# e.g., [999, 979, 959, ..., 19, 0] for 50 steps
skip = T // steps
timesteps = torch.arange(0, T, skip).flip(0)
timesteps = torch.cat([timesteps, torch.tensor([0])])
# Start from pure noise
x = torch.randn(shape, device=device)
# Iteratively denoise with larger steps
for i in range(len(timesteps) - 1):
t = timesteps[i]
t_prev = timesteps[i + 1]
# Create batch of timesteps
t_batch = torch.full((shape[0],), t, device=device, dtype=torch.long)
# Predict noise at current timestep
predicted_noise = model(x, t_batch)
# Get alpha values
alpha_bar_t = noise_schedule.alpha_bar[t]
alpha_bar_t_prev = noise_schedule.alpha_bar[t_prev]
# Predict x0 from current x_t
pred_x0 = (x - torch.sqrt(1 - alpha_bar_t) * predicted_noise) / torch.sqrt(alpha_bar_t)
# Direction pointing to x_t
dir_xt = torch.sqrt(1 - alpha_bar_t_prev) * predicted_noise
# DDIM update (deterministic when eta=0)
x = torch.sqrt(alpha_bar_t_prev) * pred_x0 + dir_xt
return xAdding Stochasticity (Optional)
The eta parameter controls randomness:
# Add optional noise for stochastic sampling
if eta > 0 and t_prev > 0:
sigma_t = eta * torch.sqrt(
(1 - alpha_bar_t_prev) / (1 - alpha_bar_t) *
(1 - alpha_bar_t / alpha_bar_t_prev)
)
noise = torch.randn_like(x)
# Modified update with noise
x = torch.sqrt(alpha_bar_t_prev) * pred_x0 + \
torch.sqrt(1 - alpha_bar_t_prev - sigma_t**2) * predicted_noise + \
sigma_t * noiseSpectrum of stochasticity:
eta = 0: Fully deterministic (recommended for speed)eta = 0.5: Some randomnesseta = 1: Equivalent to DDPM (slow but high quality)
Choosing Number of Steps
Quality vs speed tradeoff:
| Steps | Speed | Quality | Use Case |
|---|---|---|---|
| 10 | Fastest | Lower | Previews, rapid iteration |
| 20-30 | Fast | Good | Most applications |
| 50 | Medium | Very good | High quality generation |
| 100-200 | Slower | Excellent | Research, best quality |
| 1000 | Slowest | Marginal improvement | Rarely needed |
Recommendation: 50 steps provides the best balance of quality and speed for most use cases. Use 20-30 for faster iteration during development.
Performance Comparison
# Generate samples with both methods
model.eval()
# DDPM: 1000 steps
import time
start = time.time()
samples_ddpm = sample_ddpm(model, (4, 3, 64, 64), noise_schedule)
time_ddpm = time.time() - start
print(f"DDPM: {time_ddpm:.2f} seconds")
# DDIM: 50 steps
start = time.time()
samples_ddim = sample_ddim(model, (4, 3, 64, 64), noise_schedule, steps=50)
time_ddim = time.time() - start
print(f"DDIM: {time_ddim:.2f} seconds")
print(f"Speedup: {time_ddpm / time_ddim:.1f}x")Typical output:
DDPM: 50.3 seconds
DDIM: 2.5 seconds
Speedup: 20.1xWhy DDIM Works: Mathematical Insight
Key theoretical contribution:
DDPM defines a specific diffusion process (stochastic with noise at each step). DDIM shows there are infinitely many processes that:
- Have the same marginal distributions
- Can use the same trained noise predictor
- Allow deterministic sampling and step skipping
Practical insight:
- The model learns to predict noise
- We can use that prediction in different ways
- DDIM’s way is faster without retraining
No Retraining Needed!
The most amazing part of DDIM:
Use your existing DDPM model - just change the sampling algorithm! No need to retrain anything.
This is why DDIM had such immediate impact - everyone could instantly speed up their models 20x.
Conditional DDIM
DDIM works seamlessly with conditioning (text, class labels, etc.):
@torch.no_grad()
def sample_ddim_conditional(model, condition, shape, steps=50):
"""DDIM sampling with conditioning"""
x = torch.randn(shape)
timesteps = create_timestep_schedule(steps)
for i in range(len(timesteps) - 1):
t = timesteps[i]
t_prev = timesteps[i + 1]
# Predict noise with conditioning
predicted_noise = model(x, t, condition)
# DDIM update (same as unconditional)
x = ddim_update(x, predicted_noise, t, t_prev)
return xResults and Impact
Quantitative Results
On ImageNet 256×256:
- 50 steps: Nearly identical FID to 1000-step DDPM
- 20 steps: Minor quality degradation, still excellent
- 10 steps: Noticeable but acceptable quality
Impact on the Field
DDIM transformed diffusion models from research curiosities to practical tools:
- DALL-E 2: Uses DDIM-style sampling
- Stable Diffusion: DDIM for fast generation (20-50 steps)
- Midjourney: Optimized sampling based on DDIM insights
Without DDIM, diffusion would still be too slow for production use.
DDIM vs DDPM Summary
| Aspect | DDPM | DDIM |
|---|---|---|
| Sampling steps | 1000 | 20-50 |
| Sampling time | Slow | 20-50x faster |
| Stochastic | Yes | Optional (eta parameter) |
| Quality | Excellent | Nearly identical |
| Training | Standard | Use same DDPM model |
| Flexibility | Fixed steps | Skip any steps |
| Reproducibility | Stochastic | Deterministic (eta=0) |
When to Use DDIM
Use DDIM when:
- ✅ You need fast generation
- ✅ You want deterministic outputs (reproducibility)
- ✅ You’re doing iterative editing (same seed = same result)
- ✅ Deploying to production
- ✅ Real-time or interactive applications
Use DDPM when:
- ❓ You want maximum diversity (debatable)
- ❓ Research comparison (matching paper conditions)
In practice, DDIM is almost always preferred.
Practical Tips
- Start with 50 steps: Good default for quality/speed balance
- Use eta=0: Deterministic sampling is faster and often better
- Experiment with step schedules: Linear spacing is good, but quadratic or custom schedules can help
- Cache timestep embeddings: Precompute for efficiency
- Lower steps for previews: Use 10-20 steps during development
Advanced: Custom Timestep Schedules
Different schedules can improve quality:
def create_timestep_schedule(T, steps, schedule_type='linear'):
"""Create custom timestep schedule"""
if schedule_type == 'linear':
# Uniform spacing
return torch.linspace(0, T-1, steps).long().flip(0)
elif schedule_type == 'quadratic':
# More steps for early denoising
t = torch.linspace(0, 1, steps) ** 2
return (t * (T-1)).long().flip(0)
elif schedule_type == 'cosine':
# Cosine schedule
t = torch.linspace(0, 1, steps)
alpha_bar = torch.cos((t + 0.008) / 1.008 * math.pi / 2) ** 2
return (alpha_bar * (T-1)).long().flip(0)Limitations and Extensions
Limitations:
- Still slower than GANs (which generate in 1 step)
- Quality slightly degrades with very few steps (<20)
Extensions:
- PLMS (2022): Further speedup with Pseudo Linear Multi-Step
- DPM-Solver (2022): Optimized ODE solvers for diffusion
- Consistency Models (2023): Single-step generation while maintaining quality
Related Concepts
- DDPM Paper - Original training algorithm
- Diffusion Fundamentals - Forward/reverse process
- Classifier-Free Guidance - Text conditioning (uses DDIM)
Key References
- Original Paper - DDIM paper
- Official Code - PyTorch implementation
- Hugging Face Diffusers - Production implementation
Learning Resources
Paper Explanations
- AI Coffee Break: DDIM Explained - Video walkthrough
- Yannic Kilcher: DDIM Paper Review - In-depth analysis
Implementation Guides
- Hugging Face Tutorial - Using DDIM in practice
- PyTorch Lightning Example - Complete training example
Mathematical Depth
- Lilian Weng: Diffusion Models - Section on DDIM
- Yang Song: Score-Based Models - Unified view