Scikit-Learn Clustering Algorithms — ELI5

Imagine you dump a box of 200 Lego bricks on the floor. Nobody tells you what to build — you just start sorting. You might group them by color, or by size, or by shape. Different people would make different groups, but most would end up with something sensible.

Clustering is when a computer does this with data. Instead of Lego bricks, imagine thousands of customers. Each customer has information attached: how often they shop, how much they spend, what they buy. Clustering finds natural groups — maybe “bargain hunters,” “luxury shoppers,” and “occasional browsers” — without anyone defining those categories in advance.

The magic is that nobody tells the computer what the groups should be. It discovers them by looking at which data points are similar to each other. Points that are close together (in terms of their characteristics) end up in the same group.

Think of it like a school cafeteria on the first day. Nobody assigns tables, but by lunch, friend groups have naturally formed — people who have things in common sit together.

There are different strategies for finding these groups:

The “pick captains” approach: Choose a few central points (captains), then assign everyone to their nearest captain. Rearrange captains. Repeat until stable.

The “connect the dots” approach: Start with each point alone. Connect the closest points, then the closest small groups, building bigger clusters from the bottom up.

The “dense neighborhoods” approach: Find areas where points are packed tightly together. Sparse areas between dense spots become the boundaries between clusters.

Each strategy finds different kinds of groups, which is why scikit-learn offers multiple clustering methods.

One thing to remember: Clustering finds natural groups in data without being told what to look for — the computer discovers the patterns on its own, just like you’d sort objects by the similarities you notice.

pythonmachine-learningscikit-learn

See Also

  • Activation Functions Why neural networks need these tiny mathematical functions — and how ReLU's simplicity accidentally made deep learning possible.
  • Ai Agents Architecture How AI systems go from answering questions to actually doing things — the design patterns that turn language models into autonomous agents that browse, code, and plan.
  • Ai Agents ChatGPT answers questions. AI agents actually do things — browse the web, write code, send emails, and keep going until the job is done. Here's the difference.
  • Ai Ethics Why building AI fairly is harder than it sounds — bias, accountability, privacy, and who gets to decide what AI is allowed to do.
  • Ai Hallucinations ChatGPT sometimes makes up facts with total confidence. Here's the weird reason why — and why it's not as simple as 'the AI lied.'