Python Pagination Patterns — Deep Dive
Offset Pagination Implementation
FastAPI with SQLAlchemy
from fastapi import FastAPI, Query
from sqlalchemy import select, func
from sqlalchemy.ext.asyncio import AsyncSession
app = FastAPI()
@app.get("/users")
async def list_users(
page: int = Query(1, ge=1),
per_page: int = Query(20, ge=1, le=100),
db: AsyncSession = Depends(get_db),
):
offset = (page - 1) * per_page
# Count total (expensive on large tables)
total = await db.scalar(select(func.count()).select_from(User))
# Fetch page
query = select(User).order_by(User.id).offset(offset).limit(per_page)
result = await db.execute(query)
users = result.scalars().all()
return {
"data": [user.to_dict() for user in users],
"pagination": {
"total": total,
"page": page,
"per_page": per_page,
"pages": (total + per_page - 1) // per_page,
},
}
The COUNT(*) query is the hidden cost. On PostgreSQL with millions of rows, this alone can take hundreds of milliseconds. Consider caching the count or returning an approximate count using reltuples from pg_stat_user_tables.
Django Paginator
Django has built-in offset pagination:
from django.core.paginator import Paginator
from django.http import JsonResponse
def user_list(request):
page_num = int(request.GET.get("page", 1))
per_page = min(int(request.GET.get("per_page", 20)), 100)
users = User.objects.all().order_by("id")
paginator = Paginator(users, per_page)
page = paginator.get_page(page_num)
return JsonResponse({
"data": list(page.object_list.values("id", "name", "email")),
"pagination": {
"total": paginator.count,
"page": page.number,
"per_page": per_page,
"pages": paginator.num_pages,
"has_next": page.has_next(),
"has_prev": page.has_previous(),
},
})
Django’s Paginator wraps OFFSET/LIMIT and handles edge cases like out-of-range pages.
Cursor Pagination Implementation
Encoding and Decoding Cursors
import base64
import json
from datetime import datetime
def encode_cursor(data: dict) -> str:
json_str = json.dumps(data, default=str)
return base64.urlsafe_b64encode(json_str.encode()).decode()
def decode_cursor(cursor: str) -> dict:
try:
json_str = base64.urlsafe_b64decode(cursor.encode()).decode()
return json.loads(json_str)
except Exception:
raise ValueError("Invalid cursor")
Use urlsafe_b64encode so cursors work in URL query parameters without escaping.
FastAPI Cursor Pagination
@app.get("/events")
async def list_events(
cursor: str | None = None,
limit: int = Query(20, ge=1, le=100),
db: AsyncSession = Depends(get_db),
):
query = select(Event).order_by(Event.created_at.desc(), Event.id.desc())
if cursor:
cursor_data = decode_cursor(cursor)
# Seek past the cursor position
query = query.where(
(Event.created_at < cursor_data["created_at"]) |
(
(Event.created_at == cursor_data["created_at"]) &
(Event.id < cursor_data["id"])
)
)
# Fetch one extra to check for next page
result = await db.execute(query.limit(limit + 1))
events = result.scalars().all()
has_next = len(events) > limit
events = events[:limit]
next_cursor = None
if has_next and events:
last = events[-1]
next_cursor = encode_cursor({
"created_at": last.created_at.isoformat(),
"id": last.id,
})
return {
"data": [e.to_dict() for e in events],
"pagination": {
"has_next": has_next,
"next_cursor": next_cursor,
"limit": limit,
},
}
The limit + 1 trick is critical: fetch one extra row to determine if there’s a next page without a separate count query.
Multi-Column Cursor Sort
Sorting by multiple columns requires compound cursor conditions. For ORDER BY created_at DESC, id DESC:
WHERE (created_at, id) < (:cursor_created_at, :cursor_id)
PostgreSQL supports tuple comparison natively. For databases that don’t, expand into the OR condition shown above.
Keyset Pagination
Keyset is cursor pagination with transparent, human-readable parameters:
@app.get("/products")
async def list_products(
after_id: int | None = None,
limit: int = Query(20, ge=1, le=100),
db: AsyncSession = Depends(get_db),
):
query = select(Product).order_by(Product.id)
if after_id is not None:
query = query.where(Product.id > after_id)
result = await db.execute(query.limit(limit + 1))
products = result.scalars().all()
has_next = len(products) > limit
products = products[:limit]
return {
"data": [p.to_dict() for p in products],
"pagination": {
"has_next": has_next,
"next_after_id": products[-1].id if has_next else None,
},
}
Keyset works beautifully when you have a natural unique ordered key (like auto-increment IDs or timestamps with tie-breakers).
Performance Analysis
The critical difference shows up on deep pages:
Offset pagination generates: SELECT * FROM events ORDER BY id LIMIT 20 OFFSET 100000
The database engine scans 100,020 rows, discards 100,000, and returns 20. On a table with 10 million rows, this gets progressively slower.
Cursor pagination generates: SELECT * FROM events WHERE id > 100000 ORDER BY id LIMIT 20
With an index on id, this is an index seek directly to row 100,001, then reads 20 rows. Constant time regardless of position.
Benchmark on a PostgreSQL table with 5 million rows:
| Page depth | Offset time | Cursor time |
|---|---|---|
| Page 1 | 2ms | 2ms |
| Page 100 | 8ms | 2ms |
| Page 10,000 | 450ms | 2ms |
| Page 100,000 | 4,200ms | 2ms |
The difference becomes dramatic beyond a few thousand pages.
Django REST Framework Integration
DRF provides built-in pagination classes:
# settings.py
REST_FRAMEWORK = {
"DEFAULT_PAGINATION_CLASS": "rest_framework.pagination.CursorPagination",
"PAGE_SIZE": 20,
}
# Or per-view
from rest_framework.pagination import CursorPagination
class EventPagination(CursorPagination):
page_size = 20
ordering = "-created_at"
cursor_query_param = "cursor"
class EventViewSet(viewsets.ReadOnlyModelViewSet):
queryset = Event.objects.all()
serializer_class = EventSerializer
pagination_class = EventPagination
DRF’s CursorPagination handles encoding, decoding, and reverse pagination automatically.
Edge Cases
Empty results: Always return a valid response structure even when there are no results. {"data": [], "pagination": {"has_next": false}} is correct.
Deleted items: If a user paginates through results and items are deleted between requests, cursor pagination handles this gracefully (it just picks up from the cursor position). Offset pagination may skip items or show duplicates.
Concurrent inserts: For real-time feeds where new items appear frequently, cursor pagination prevents the “shifting window” problem where offset-based pages show the same item twice because a new item pushed everything down.
Bidirectional cursors: To support “previous page” in cursor pagination, include both next_cursor and prev_cursor:
prev_cursor = None
if cursor: # Not the first page
first = events[0]
prev_cursor = encode_cursor({
"created_at": first.created_at.isoformat(),
"id": first.id,
"direction": "prev",
})
The server uses the direction field to flip the sort and comparison operators.
Hybrid Approach
Some APIs use offset for the first N pages and switch to cursor for deeper results:
@app.get("/search")
async def search(
q: str,
page: int = Query(1, ge=1, le=100), # Max 100 pages via offset
cursor: str | None = None,
):
if cursor:
return await _cursor_search(q, cursor)
if page <= 100:
return await _offset_search(q, page)
raise HTTPException(400, "Use cursor for pages beyond 100")
Google Search does something similar — you can’t access page 50 of results directly. This avoids the performance cliff of deep offset pagination while keeping the simplicity of page numbers for common use cases.
The one thing to remember: Implement cursor pagination from the start using the limit + 1 trick and compound sort keys — it’s barely more complex than offset but scales to any dataset size without performance degradation.
See Also
- Python Aiohttp Client Understand Aiohttp Client through a practical analogy so your Python decisions become faster and clearer.
- Python Api Client Design Why building your own API client in Python is like creating a TV remote that only has the buttons you actually need.
- Python Api Documentation Swagger Swagger turns your Python API into an interactive playground where anyone can click buttons to try it out — no coding required.
- Python Api Mocking Responses Why testing with fake API responses is like rehearsing a play with stand-ins before the real actors show up.
- Python Api Pagination Clients Why APIs send data in pages, and how Python handles it — like reading a book one chapter at a time instead of swallowing the whole thing.