Contrastive Learning — Explain Like I'm 5

How AI learns what things are like each other — and what they're not — without any labels, creating the representations behind image search and face recognition.

Learning By Comparison

Imagine you’re learning to sort photos without knowing any labels. Someone gives you 1,000 photos and this rule: “photos from the same camera burst (taken 0.5 seconds apart) should be grouped together; everything else should be separated.”

So you see a photo of a cat — normal pose — and a slightly tilted, brighter version of the same photo (same burst). You learn they should be similar. You see a photo of a dog in the same slot — definitely different. After thousands of these comparison examples, you’d start grouping things by visual similarity in a natural way.

That’s contrastive learning. The “rule” is not “this is a cat” — it’s “these two views of the same thing should look similar, and everything else should look different.”

The Two Views Trick

The clever part: you don’t need labeled photos. You take one photo and make two different versions of it:

Crop it differently
Change the brightness
Flip it horizontally
Add a bit of blur

These are two “views” of the same image. They should look similar to the model. Any other photo in the batch should look different.

By learning to group similar views together and separate everything else, the model learns rich visual representations — understanding shapes, textures, and concepts — without anyone ever labeling a single image.

What It Enables

Contrastive learning is what made CLIP (OpenAI, 2021) work. CLIP was trained on 400 million (image, text) pairs from the internet. For each image, the caption was the “other view.” The model learned to make images and their captions similar in representation space.

The result? You can search Google Photos with “sunset at the beach” and find matching photos — even photos you never labeled. The model learned that text descriptions and visual content should be represented similarly.

One thing to remember: Contrastive learning defines similarity through comparison rather than labels — by learning what goes together and what doesn’t, models develop rich representations without needing human annotation.

contrastive-learningself-supervised-learningembeddingssimclrclip

Contrastive Learning — Explain Like I'm 5

Learning By Comparison

The Two Views Trick

What It Enables

See Also

Related Topics