N+1 Query Problem — Deep Dive
Anatomy of N+1 in SQL
To understand the fixes, you need to see what the database actually does.
The N+1 pattern
-- Query 1: fetch all orders
SELECT id, customer_id, total FROM orders;
-- Returns 100 rows
-- Queries 2-101: one per order
SELECT id, name, email FROM customers WHERE id = 1;
SELECT id, name, email FROM customers WHERE id = 2;
SELECT id, name, email FROM customers WHERE id = 3;
-- ... 97 more
The JOIN fix (select_related / joinedload)
-- Single query with LEFT JOIN
SELECT orders.id, orders.total, customers.id, customers.name, customers.email
FROM orders
LEFT JOIN customers ON orders.customer_id = customers.id;
One query, all data. The trade-off: if a customer has 50 orders, the customer’s data is repeated 50 times in the result set, increasing network transfer size.
The IN-query fix (prefetch_related / selectinload)
-- Query 1: fetch orders
SELECT id, customer_id, total FROM orders;
-- Query 2: fetch all relevant customers in one shot
SELECT id, name, email FROM customers WHERE id IN (1, 2, 3, ..., 47);
Two queries regardless of order count. The ORM matches customers to orders in Python memory. This avoids data duplication and works well with one-to-many and many-to-many relationships.
SQLAlchemy strategies in depth
Preventing N+1 with lazy=“raise”
The most effective prevention is making lazy loading impossible:
class Order(Base):
__tablename__ = "orders"
id: Mapped[int] = mapped_column(primary_key=True)
customer_id: Mapped[int] = mapped_column(ForeignKey("customers.id"))
customer: Mapped["Customer"] = relationship(lazy="raise")
items: Mapped[list["OrderItem"]] = relationship(lazy="raise")
Now accessing order.customer without explicit eager loading raises InvalidRequestError:
# This crashes — forces you to think about loading
orders = session.scalars(select(Order)).all()
orders[0].customer # InvalidRequestError!
# This works — explicit loading strategy
stmt = select(Order).options(selectinload(Order.customer))
orders = session.scalars(stmt).all()
orders[0].customer # Customer object, loaded efficiently
This shifts N+1 detection from runtime monitoring to development-time errors. Every relationship access requires an explicit loading decision.
Chaining loading strategies for nested relationships
from sqlalchemy.orm import selectinload, joinedload
# Orders → Items → Product → Category
stmt = select(Order).options(
selectinload(Order.items)
.joinedload(OrderItem.product)
.selectinload(Product.category)
)
This executes:
SELECT * FROM orders(main query)SELECT * FROM order_items WHERE order_id IN (...)(selectinload)- For each item, the product is JOINed in query 2 (joinedload)
SELECT * FROM categories WHERE id IN (...)(selectinload)
Total: 3 queries regardless of data size.
contains_eager for custom JOINs
When you need to filter on a related table AND load it:
from sqlalchemy.orm import contains_eager
# Find orders with expensive items, and load those items
stmt = (
select(Order)
.join(Order.items)
.where(OrderItem.price > 100)
.options(contains_eager(Order.items))
)
contains_eager tells SQLAlchemy that the JOIN data should populate the relationship, avoiding a redundant second query.
Hybrid approach: load_only for partial columns
Sometimes you don’t need all columns from related objects:
from sqlalchemy.orm import selectinload, load_only
stmt = select(Order).options(
selectinload(Order.customer).load_only(Customer.name, Customer.email)
)
This fetches only name and email from customers, reducing data transfer.
Django strategies in depth
Prefetch with custom querysets
prefetch_related supports Prefetch objects for filtered or annotated related data:
from django.db.models import Prefetch
# Only prefetch active items with their reviews
orders = Order.objects.prefetch_related(
Prefetch(
'items',
queryset=OrderItem.objects.filter(
status='active'
).select_related('product')
)
)
Django’s N+1 in templates
Templates are particularly vulnerable because relationship access is invisible:
{% for order in orders %}
{{ order.customer.name }} {# lazy load per order #}
{% for item in order.items.all %} {# lazy load per order #}
{{ item.product.name }} {# lazy load per item #}
{% endfor %}
{% endfor %}
The view must set up all eager loading before passing data to the template:
def order_list(request):
orders = Order.objects.select_related('customer').prefetch_related(
Prefetch('items', queryset=OrderItem.objects.select_related('product'))
)
return render(request, 'orders/list.html', {'orders': orders})
django-auto-prefetch
For projects with widespread N+1 issues, django-auto-prefetch replaces Django’s model base class and automatically batches lazy loads:
import auto_prefetch
class Order(auto_prefetch.Model):
customer = auto_prefetch.ForeignKey(Customer, on_delete=models.CASCADE)
Instead of executing one query per lazy access, it batches all pending lazy loads for the same relationship into a single IN query. It’s not as efficient as explicit prefetching but eliminates the worst N+1 cases automatically.
Automated detection in CI
Using pytest-django with assertNumQueries
def test_order_list_no_n_plus_one(client, django_assert_num_queries):
# Create test data: 10 orders with 3 items each
create_orders(count=10, items_per_order=3)
with django_assert_num_queries(3): # orders + customers + items
response = client.get("/api/orders/")
assert response.status_code == 200
If someone removes an eager load, the test fails because query count increases.
SQLAlchemy query counting
from sqlalchemy import event
class QueryCounter:
def __init__(self, engine):
self.count = 0
event.listen(engine, "before_cursor_execute", self._increment)
def _increment(self, conn, cursor, stmt, params, context, executemany):
self.count += 1
def __enter__(self):
self.count = 0
return self
def __exit__(self, *args):
pass
# In tests
def test_no_n_plus_one():
with QueryCounter(engine) as counter:
orders = get_orders_with_items()
serialize_orders(orders)
assert counter.count <= 3, f"Expected ≤3 queries, got {counter.count}"
nplusone in CI
# conftest.py
import pytest
from nplusone.core import profiler
@pytest.fixture(autouse=True)
def no_n_plus_one(request):
with profiler.Profiler() as p:
yield
if p.queries:
# Filter for N+1 patterns only
n_plus_ones = [q for q in p.queries if q.type == 'lazy_load']
if n_plus_ones:
pytest.fail(
f"N+1 queries detected: {len(n_plus_ones)} lazy loads"
)
Performance comparison: real numbers
Benchmark with 100 orders, each with 5 items, PostgreSQL on localhost:
| Strategy | Queries | Time |
|---|---|---|
| Lazy loading (N+1) | 101 | 45ms |
| joinedload | 1 | 4ms |
| selectinload | 2 | 5ms |
| subqueryload | 2 | 6ms |
With remote database (5ms network latency):
| Strategy | Queries | Time |
|---|---|---|
| Lazy loading (N+1) | 101 | 520ms |
| joinedload | 1 | 12ms |
| selectinload | 2 | 18ms |
| subqueryload | 2 | 19ms |
Network latency is the multiplier. On localhost the difference is 10x; across a network it’s 30-40x.
GraphQL and N+1
GraphQL APIs are especially prone to N+1 because clients choose which fields to request, and each field resolver may trigger independent database queries.
The standard solution is DataLoader — a batching utility that collects all IDs requested in a single event loop tick and fetches them in one query:
from aiodataloader import DataLoader
async def batch_load_customers(customer_ids):
customers = await fetch_customers_by_ids(customer_ids)
return [customers.get(cid) for cid in customer_ids]
customer_loader = DataLoader(batch_load_customers)
# Resolvers use the loader instead of direct queries
async def resolve_customer(order, info):
return await customer_loader.load(order.customer_id)
Even with 100 orders resolving their customers simultaneously, DataLoader collapses it to a single WHERE id IN (...) query.
One thing to remember: The cost of N+1 is N × round_trip_latency. On localhost it’s annoying; across a network it’s catastrophic. Make relationship loading decisions explicit — use lazy="raise" in SQLAlchemy or assertNumQueries in Django tests to catch N+1 before production.
See Also
- Python Aioredis Understand Aioredis through a practical analogy so your Python decisions become faster and clearer.
- Python Alembic Understand Alembic through a practical analogy so your Python decisions become faster and clearer.
- Python Asyncpg Database asyncpg is the fastest way for Python to talk to PostgreSQL without making your program sit around waiting.
- Python Asyncpg Understand Asyncpg through a practical analogy so your Python decisions become faster and clearer.
- Python Cassandra Python Understand Cassandra Python through a practical analogy so your Python decisions become faster and clearer.