Machine Learning — Core Concepts
What Is Machine Learning?
Machine learning (ML) is a method of building software that learns from data rather than from hand-written rules. Instead of programming a computer to recognize spam by listing keywords, you show it 100,000 examples of spam and non-spam emails, and it figures out the distinguishing patterns itself.
The result isn’t a list of rules you can read. It’s a mathematical structure — a model — that has encoded what spam tends to look like, in a form only the computer can use.
How Training Actually Works
Think of training like this: you start with a model that knows nothing. You feed it an example (say, a photo of a cat) and ask “what is this?” It guesses randomly — maybe “car.” You tell it it’s wrong, and by how much. The model adjusts its internal settings slightly in the direction of being less wrong. Then you feed it the next example. Repeat this millions of times.
This feedback loop is called gradient descent — the model is always nudging itself toward fewer mistakes. By the end of training, it hasn’t been told “cats have pointy ears.” It has learned that pointy ears correlate with “cat” from seeing enough evidence.
Three elements make this work:
- Data — the examples you train on
- A model architecture — the mathematical structure that does the learning
- A loss function — how you measure “wrong” so the model knows which direction to adjust
The Three Types of Machine Learning
1. Supervised Learning
You provide labeled examples: this email is spam, that one isn’t. This photo is a dog, that one is a cat. The model learns to map inputs to the right label.
Real use: Google Photos recognizing your face in pictures. Netflix predicting whether you’ll like a movie. Credit card fraud detection (Visa reportedly uses supervised models that review billions of transactions per day).
2. Unsupervised Learning
You provide examples with no labels. The model finds structure on its own — grouping similar things together, finding unusual patterns, compressing information.
Real use: Spotify grouping listeners by taste without anyone defining the genres. Customer segmentation in e-commerce (Amazon grouping shoppers by purchase behavior to target promotions).
3. Reinforcement Learning
The model takes actions in an environment and receives rewards or penalties. No labeled data — just trial and error.
Real use: DeepMind’s AlphaGo, which learned to beat world champions at Go by playing millions of games against itself. OpenAI’s bots that learned to play Dota 2 at superhuman level, discovering strategies professional players had never tried.
Key Concepts
Features
The inputs the model uses to make decisions. In a house price model, features might be: square footage, number of bedrooms, zip code, year built. Choosing the right features — called feature engineering — is often what separates a mediocre model from a great one.
Overfitting
When a model memorizes the training data instead of learning from it. Like a student who memorizes every answer from last year’s exam but can’t solve a new problem. The model gets nearly perfect scores on data it’s seen, and fails on data it hasn’t. Prevented through techniques like cross-validation and regularization.
Training vs. Inference
Training is the slow, expensive process of learning from data. Inference is applying the trained model to new inputs — this is what happens in real time when you ask Siri a question or Shazam recognizes a song. Training might take days on expensive hardware; inference happens in milliseconds on your phone.
The Train/Test Split
Before training begins, you hold back a chunk of your data — say 20% — that the model never sees during training. After training, you test the model on this held-out data to measure how well it generalizes. This is the closest thing to an honest performance measurement.
Common Misconception
“Machine learning models understand what they’re doing.”
They don’t. A language model that generates fluent text doesn’t understand language — it has learned statistical patterns over billions of words. A model that detects tumors in X-rays hasn’t learned medicine; it has found pixel patterns that correlate with pathologist labels.
This matters because it explains why ML models can fail in bizarre ways. A famous 2018 study showed that a model trained to detect wolves in images was actually detecting snow — because most wolf photos had snowy backgrounds. The model “learned” the wrong pattern because the pattern worked on training data.
Why This Changed Everything
Before ML, automating a task meant someone had to understand and encode all the rules. A programmer had to know what spam looks like. A doctor had to define what a tumor looks like in pixels.
ML removed that bottleneck. Tasks that were impossible to rule-engineer — recognizing handwriting, translating languages, predicting protein structures — became tractable once you could throw enough labeled data at a model and let it find its own rules.
DeepMind’s AlphaFold2 (2020) predicted the 3D structure of nearly every known protein, a 50-year unsolved biology problem. It did it not by understanding biochemistry, but by learning patterns from the 170,000 known protein structures in scientific databases.
One Thing to Remember
Machine learning is the art of letting data write the rules. It works remarkably well — and fails in ways that can be remarkably hard to predict, because the model doesn’t know why its rules work.
See Also
- Ai Hallucinations ChatGPT sometimes makes up facts with total confidence. Here's the weird reason why — and why it's not as simple as 'the AI lied.'
- Artificial Intelligence What is AI really? Think of it as a dog that learned tricks — impressive, but it doesn't know why it's doing them.
- Bias Variance Tradeoff The fundamental tension in machine learning between being wrong in the same way vs. being wrong in different ways — and why the simplest model isn't always best.
- Deep Learning Why your phone can spot your face in a messy photo album — and why that trick comes from practice, not magic.
- Embeddings How do computers know that 'dog' and 'puppy' mean almost the same thing? They don't read definitions — they turn words into secret map coordinates, and nearby coordinates mean nearby meanings.