Database Indexing Strategies — Core Concepts
Why indexing strategy matters
A well-indexed database can answer queries in milliseconds that would otherwise take seconds. At Shopify’s scale, a single missing index on a frequently queried column caused cascading slowdowns affecting millions of requests. The fix — adding one index — reduced query time by 99%.
But indexing isn’t just “add indexes to everything.” Each index consumes disk space, slows down INSERT/UPDATE/DELETE operations, and requires maintenance. A production database with hundreds of unnecessary indexes can have slower writes than one with no indexes at all.
Index types
B-tree (the default)
When you create an index without specifying a type, you get a B-tree index. It works like a sorted tree structure that enables:
- Equality lookups:
WHERE email = 'user@example.com' - Range queries:
WHERE created_at > '2026-01-01' - Sorting:
ORDER BY created_at DESC - Prefix matching:
WHERE name LIKE 'John%'
B-tree indexes don’t help with:
- Suffix matching:
WHERE name LIKE '%son' - Full-text search:
WHERE body CONTAINS 'database' - Non-equality on middle columns of composite indexes
Hash index
Optimized for exact equality lookups only. Slightly faster than B-tree for WHERE id = 42 but useless for ranges or sorting. PostgreSQL has supported hash indexes since version 10, but B-tree is almost always the better choice.
GIN (Generalized Inverted Index)
Used for values that contain multiple elements — arrays, JSONB fields, full-text search vectors. If you search inside JSONB data or use PostgreSQL’s full-text search, GIN indexes make those queries fast.
GiST (Generalized Search Tree)
Used for geometric data, range types, and proximity searches. If you query “find all restaurants within 5km,” a GiST index on the location column makes it efficient.
Choosing what to index
Follow your queries
The golden rule: index the columns that appear in your WHERE clauses, JOIN conditions, and ORDER BY clauses. If no query ever filters by middle_name, don’t index it.
High-selectivity columns first
An index on a column with only two possible values (like is_active: true/false) is nearly useless — the database has to scan half the table anyway. Indexes shine on columns with many distinct values: email addresses, user IDs, timestamps.
The most important indexes
For any Python web application, these are almost always needed:
- Foreign keys: Every
ForeignKeycolumn should have an index (Django creates these automatically; SQLAlchemy doesn’t) - Unique constraints: These create indexes automatically
- Columns used in WHERE clauses of frequent queries
- Columns used in ORDER BY on large result sets
Composite indexes
A composite (multi-column) index covers queries that filter on multiple columns:
An index on (customer_id, created_at) efficiently handles:
WHERE customer_id = 42(uses first column)WHERE customer_id = 42 AND created_at > '2026-01-01'(uses both)WHERE customer_id = 42 ORDER BY created_at(uses both)
It does NOT efficiently handle:
WHERE created_at > '2026-01-01'(skips first column)
This is the leftmost prefix rule — the database can only use a composite index starting from the leftmost column. Column order matters enormously.
Partial indexes
A partial index covers only rows matching a condition:
CREATE INDEX idx_active_users ON users (email) WHERE is_active = true;
If 90% of your queries filter for active users, this index is 10x smaller than a full index and faster to maintain. In Django, you define these with condition:
class Meta:
indexes = [
models.Index(
fields=['email'],
condition=Q(is_active=True),
name='idx_active_users',
),
]
Common misconception
“Adding more indexes always makes reads faster.”
Beyond a certain point, additional indexes provide diminishing returns while continuing to slow down writes. The query planner also becomes more complex — with many possible indexes, it may occasionally choose a suboptimal plan. Most tables need 3–8 indexes, rarely more.
One thing to remember: Start with indexes on foreign keys and columns you filter by. Use EXPLAIN ANALYZE on slow queries to verify the database actually uses your indexes. Remove indexes that no query uses — they’re pure overhead.
See Also
- Python Aioredis Understand Aioredis through a practical analogy so your Python decisions become faster and clearer.
- Python Alembic Understand Alembic through a practical analogy so your Python decisions become faster and clearer.
- Python Asyncpg Database asyncpg is the fastest way for Python to talk to PostgreSQL without making your program sit around waiting.
- Python Asyncpg Understand Asyncpg through a practical analogy so your Python decisions become faster and clearer.
- Python Cassandra Python Understand Cassandra Python through a practical analogy so your Python decisions become faster and clearer.