Diffusion Models — Explain Like I'm 5

Stable Diffusion and DALL-E don't 'draw' your images — they unspoil a scrambled mess until a picture emerges. Here's the surprisingly simple idea behind it.

The Scrambled Egg That Unscrambles Itself

You know how scrambled eggs can never go back to being a whole egg? Once you’ve mixed everything up, the information is gone. Except… what if you learned exactly how eggs get scrambled, step by step?

If you watched thousands of eggs being scrambled in slow motion — every swirl, every break — you might get so good at recognizing the patterns that you could reverse it in your head. “This blob of yellow came from that part of the yolk. These white streaks used to be over there.”

Diffusion models are AIs that learned this exact trick. But instead of eggs, they do it with pictures.

How They Learn

During training, the AI watches millions of pictures get slowly buried under noise — like adding TV static one layer at a time until the image completely disappears into random colored dots.

Then it practices guessing: “If I see this amount of static, which pixels were probably hiding underneath?” It does this billions of times. After a while, it gets really good at spotting the ghost of an image inside the noise.

How It Makes Your Picture

When you type “a cat wearing a space helmet,” here’s what actually happens:

The AI starts with a screen of pure random noise — total TV static.
It asks itself: what image might be hiding inside this static, that could match a cat in a space helmet?
It cleans up the noise a tiny bit in the right direction.
Then it cleans a bit more. And more. Around 20–50 rounds of this.
Eventually, a cat astronaut materializes.

It’s less like drawing and more like developing a photograph — the picture was always potentially there, it just needed the static removed.

Why the Results Look So Good

The AI learned from maybe five billion real photos. So it knows, without being told, that cat fur has texture, that glass helmets have reflections, that space backgrounds are dark with pinprick stars. Those details sneak in automatically, because the AI absorbed them from real images.

It’s not copying any existing photo. It’s making something new that fits the pattern of what things look like.

One Thing to Remember

Diffusion models make images by learning to undo noise — not by learning to draw. They start with static and slowly reveal a picture that matches your description. Every image they make has been “developed” out of chaos.

aigenerative-aidiffusion-modelsimage-generation

Diffusion Models — Explain Like I'm 5

The Scrambled Egg That Unscrambles Itself

How They Learn

How It Makes Your Picture

Why the Results Look So Good

One Thing to Remember

See Also

Related Topics