Generative AI — Core Concepts

From GANs to diffusion models to LLMs — how generative AI actually works, why it exploded in 2022, and what separates the hype from what's real.

What Generative AI Actually Is

Most AI is about classification. Show it a photo, it says “cat.” Feed it a transaction, it says “fraud” or “not fraud.” These systems learn to label or predict from existing data.

Generative AI flips this. Instead of labeling existing things, it creates new things. New text, new images, new audio, new video, new code. The input is a prompt or some noise; the output is something that didn’t exist before.

This isn’t new in principle — researchers were building generative models in the 1980s. What changed is scale. Between 2020 and 2023, compute got cheap enough, datasets got big enough, and architectures got smart enough that the output stopped looking like a blur of approximations and started looking like something a human made.

The Three Main Architectures

Generative Adversarial Networks (GANs) — 2014

Ian Goodfellow invented this while arguing with colleagues in a Montreal bar. The setup is adversarial: two neural networks compete.

The generator creates fake images. The discriminator tries to catch them. They train together — the generator learns to fool the discriminator, the discriminator learns to spot fakes. When they’re evenly matched, the generator is producing images so convincing the discriminator is basically guessing.

GANs gave us DeepFakes, DALL-E v1, and Nvidia’s PhotoRealGAN. The problem: they’re unstable to train and they specialize. A GAN trained on faces makes faces. You can’t just ask it to also write poetry.

Diffusion Models — went mainstream 2021–2022

This is how Stable Diffusion, DALL-E 2, and Midjourney work. The idea is almost backwards from what you’d expect.

Training involves systematically destroying an image by adding random noise, step by step, until it’s pure static. The model learns to reverse this process — to predict what the clean image looked like given the noisy version.

At inference time, you start with pure noise and ask the model to denoise it, guided by your text prompt. The result: an image that plausibly fits your description, generated from nothing but structured chaos.

Diffusion models are slower than GANs but more controllable and more versatile. Stable Diffusion (released open-source in August 2022) blew up partly because the whole 4 GB model could run on a $400 graphics card.

Large Language Models (LLMs)

GPT-4, Claude, Gemini, Llama — these are text-first generative models based on the Transformer architecture (2017). They learn by reading enormous amounts of text and predicting what token comes next. They’re so good at next-token prediction that when they chain millions of predictions together, they produce coherent paragraphs, arguments, code, stories.

Crucially, LLMs generalized to instruction-following once they hit a certain scale. GPT-2 (2019) could complete sentences. GPT-3 (2020) could write essays. GPT-4 (2023) could pass the bar exam. Something nonlinear happened at scale.

Why It Exploded After 2022

Three things converged:

Compute: A100 and H100 GPUs gave researchers far more processing power at reduced cost
Scale: Training runs started using trillions of tokens instead of billions
Instruction tuning + RLHF: Researchers learned how to align raw language models to follow instructions, not just complete them — making them actually usable

ChatGPT launched in November 2022 and reached 100 million users in two months. Faster than any app in history at the time.

What Generative AI Can and Can’t Do

Can do well:

Generate plausible-sounding text, images, audio
Transform content (translate, summarize, reformat)
Complete patterns it’s seen variations of in training
Code in widely used programming languages

Can’t do well:

Reliably get facts right (it’s not a database)
Reason about novel situations it has no training analog for
Count letters, do arithmetic without tools
Know what’s happening in the world after its training cutoff

The most important misconception: generative AI models don’t understand what they’re generating in any human sense. An LLM producing the word “Paris” doesn’t know what Paris is. It knows that “Paris” tends to follow “capital of France” in the text it was trained on. For most practical purposes, the distinction doesn’t matter. For some purposes (medical, legal, safety-critical), it matters a lot.

Real-World Applications Right Now

Text: Writing assistance (Grammarly, Notion AI), code generation (GitHub Copilot), customer service bots, search summaries

Images: Marketing visuals, concept art, product photography, medical imaging augmentation

Audio: Voice cloning, music generation (Suno, Udio), podcast production, accessibility tools

Video: Sora (OpenAI), Runway Gen-3 — still expensive and imperfect, but improving fast

The Legitimate Concerns

Generative AI can produce convincing disinformation at scale. Deepfake audio of politicians. Synthetic scientific papers. Personalized phishing emails.

There’s also a real copyright murkiness. These models trained on human-created content. When they produce work that resembles a specific artist’s style, is that fair use or theft? Courts in the US and EU are currently sorting this out — expect rulings in 2025–2026 to reshape how these models are built and licensed.

One Thing to Remember

Generative AI creates new content by learning the statistical patterns of human-made content. It’s powerful not because it understands, but because it imitates at a scale and resolution that humans can’t distinguish from the real thing — which is both what makes it useful and what makes it risky.

techaigenerative-aimachine-learningllmdiffusion