Database Indexing Strategies — ELI5

Imagine a library with 10,000 books, but no card catalog and no organization. To find a book by a specific author, you’d have to walk through every shelf and check every single book. That could take hours.

Now imagine someone creates an alphabetical list of authors with the shelf and position of each book. Suddenly, finding “Garcia Marquez” takes seconds — just look up “G” in the list and go straight to the right shelf.

That list is a database index.

A database stores your data in tables — like spreadsheets with thousands or millions of rows. When you ask “find all orders from customer #42,” the database has two choices:

  1. Without an index: Look at every single row in the table and check if it matches. With a million rows, that’s slow.
  2. With an index on customer_id: Jump straight to customer #42’s rows, like using the card catalog. Nearly instant.

So why not index everything? Because indexes have a cost. Every time you add, change, or delete a row, every related index needs to be updated too. It’s like having to update the card catalog every time the library gets a new book. A few catalogs are manageable. Fifty different catalogs (by author, by title, by year, by color, by page count…) means more time spent updating catalogs than actually shelving books.

The trick is choosing which columns to index based on how you actually search your data. If you always look up orders by customer, index that column. If nobody ever searches by order color, don’t waste an index on it.

One thing to remember: An index is a shortcut that helps the database find rows without scanning the entire table. Too few indexes make reads slow; too many indexes make writes slow. The art is picking the right ones.

pythondatabasesperformance

See Also

  • Python Aioredis Understand Aioredis through a practical analogy so your Python decisions become faster and clearer.
  • Python Alembic Understand Alembic through a practical analogy so your Python decisions become faster and clearer.
  • Python Asyncpg Database asyncpg is the fastest way for Python to talk to PostgreSQL without making your program sit around waiting.
  • Python Asyncpg Understand Asyncpg through a practical analogy so your Python decisions become faster and clearer.
  • Python Cassandra Python Understand Cassandra Python through a practical analogy so your Python decisions become faster and clearer.