Artificial Intelligence — Core Concepts
What is Artificial Intelligence?
Artificial intelligence is a broad term for computer systems that perform tasks normally requiring human cognition — recognizing faces, translating languages, making decisions. The key word is “perform.” An AI translating Spanish to English doesn’t understand either language. It has processed millions of translated sentence pairs and learned statistical relationships between words.
The field started in 1956 at a Dartmouth workshop where researchers predicted they’d solve human-level intelligence within a generation. Nearly 70 years later, we have systems that can beat world champions at chess and Go, generate photorealistic images, and hold conversations — but still can’t reliably tell you whether a sentence is sarcastic.
How It Works
AI breaks into three broad approaches, each building on the last:
1. Rule-Based Systems (Good Old-Fashioned AI)
The earliest approach: humans write explicit rules. “IF the email contains ‘Nigerian prince’ AND asks for money, THEN it’s spam.” TurboTax is a giant rule-based AI — thousands of tax rules encoded as if-then logic.
Strength: Predictable, explainable. Weakness: Someone has to write every rule. The real world has too many edge cases.
2. Machine Learning
Instead of writing rules, you give the system data and let it find the rules itself. You feed a machine learning model 10,000 loan applications labeled “defaulted” or “repaid,” and it figures out which patterns predict default — income level, debt-to-income ratio, employment history.
Three flavors:
- Supervised learning — You provide labeled examples (this photo = cat, this photo = dog). The model learns to generalize. Most commercial AI uses this.
- Unsupervised learning — No labels. The model finds structure on its own. Spotify uses this to group songs with similar audio features into clusters, which powers Discover Weekly.
- Reinforcement learning — The model learns by trial and error, getting rewards for good outcomes. Google’s DeepMind used this to train AlphaGo, which defeated world champion Lee Sedol in 2016.
3. Deep Learning
A subset of machine learning using neural networks — layers of mathematical functions loosely inspired by neurons. “Deep” means many layers. GPT-4 has hundreds of billions of parameters across many layers.
Deep learning is why AI suddenly got impressive around 2012. Three things converged:
- More data — the internet generated training data at scale
- More compute — GPUs originally built for video games turned out to be perfect for matrix math
- Better techniques — dropout, batch normalization, attention mechanisms
Key Concepts
Training vs. Inference
Training is the expensive part — processing massive datasets to adjust the model’s parameters. OpenAI reportedly spent over $100 million training GPT-4. Inference is using the trained model to make predictions. When you ask ChatGPT a question, that’s inference. Training happens once (or periodically); inference happens millions of times.
The Data Problem
AI is only as good as its training data. Amazon built a hiring AI trained on 10 years of resumes — and it learned to penalize resumes containing the word “women’s” (as in “women’s chess club”) because the historical data reflected a male-dominated hiring pattern. They scrapped it. Garbage in, bias out.
Narrow AI vs. General AI
Every AI system today is narrow — it does one thing well. AlphaGo can play Go but can’t play tic-tac-toe unless separately trained. Artificial General Intelligence (AGI) — a system that can learn any intellectual task a human can — doesn’t exist yet. Despite headlines, there’s no scientific consensus on when or if it will.
The Transformer Revolution
In 2017, Google researchers published “Attention Is All You Need,” introducing the transformer architecture. It processes all parts of input simultaneously (rather than sequentially), making it far more efficient for language tasks. This is the foundation of GPT, BERT, Claude, Gemini, and virtually every modern language model.
Common Misconception
“AI understands what it’s doing.”
When ChatGPT writes a poem about grief, it hasn’t experienced loss. It has processed millions of texts about grief and learned which word patterns follow which. This matters because people trust AI outputs as if a reasoning entity produced them. A confident, well-structured wrong answer is more dangerous than an obviously broken one. The AI didn’t reason its way to the answer — it completed a pattern.
One Thing to Remember
AI is a spectrum from simple rules to complex neural networks, but the core idea hasn’t changed: find patterns in data and use them to make predictions. The magic is in the scale, not in any spark of understanding.