Pandas Categorical Data — ELI5
Imagine you run a pizza shop and you track what size each customer orders: Small, Medium, or Large. Every single order has one of those three words written next to it.
Now, you have a million orders. That’s the word “Medium” written out hundreds of thousands of times. Every single letter stored separately. That’s wasteful — like writing the full word “pepperoni” on a million receipts instead of just writing “P” and keeping a note that says “P means pepperoni.”
That’s exactly what categorical data does. Instead of storing “Medium” a million times, Pandas stores the number 2 a million times and keeps a tiny dictionary: 1 = Small, 2 = Medium, 3 = Large. Same information, way less space.
But there’s a second superpower. Regular text data has no built-in order. Is “Medium” bigger than “Small”? Your computer doesn’t know — they’re just letters. But with categories, you can tell Pandas “Small comes before Medium comes before Large.” Now Pandas can sort correctly and compare sizes.
Think of it like a filing cabinet with labeled folders. Without categories, every paper gets filed individually and you search through everything. With categories, papers go into pre-made folders, and finding everything in a category is instant.
One thing to remember: Categorical data works best when you have a column that repeats the same few values over and over. The fewer unique values compared to total rows, the bigger the benefit.
See Also
- Python Bokeh Get an intuitive feel for Bokeh so Python behavior stops feeling unpredictable.
- Python Numpy Advanced Indexing How to cherry-pick exactly the data you want from a NumPy array using lists, masks, and fancy tricks.
- Python Numpy Broadcasting Rules How NumPy magically makes different-sized arrays work together without you writing any loops.
- Python Numpy Einsum One tiny function that replaces dozens of NumPy operations — once you learn its shorthand, array math becomes a breeze.
- Python Numpy Fft Spectral How NumPy breaks apart a signal into its hidden frequencies — like separating a chord into individual notes.