Python Read Replica Patterns — Core Concepts

Understand read replica architecture for Python apps — replication lag, routing strategies, and when replicas help vs hurt.

Why read replicas exist

Most web applications are read-heavy. A typical ratio is 90% reads to 10% writes. A single database server can become a bottleneck not because of writes, but because thousands of read queries compete for CPU, memory, and I/O.

Read replicas solve this by duplicating the database. Writes go to one primary server, and reads are distributed across multiple replicas. Each replica is a full copy of the primary, updated asynchronously via replication.

How replication works

The primary database processes a write (INSERT, UPDATE, DELETE).
It records the change in a write-ahead log (WAL in PostgreSQL, binlog in MySQL).
Replicas continuously pull changes from this log and apply them locally.
After a brief delay (replication lag), the replica matches the primary.

Replication lag

The gap between a write on the primary and its appearance on the replica is called replication lag. Typically 10–100 milliseconds, but can spike to seconds during heavy write load or network issues.

Lag matters in specific scenarios:

Read-your-own-writes — a user submits a form, then immediately sees a page that reads from a replica. If the replica hasn’t caught up, the user sees old data.
Causal consistency — User A posts a comment, User B reads it, but User B’s request hits a lagging replica and sees nothing.

Routing strategies

Route all reads to replicas (simplest)

Every SELECT goes to a replica. Works for dashboards, reports, and pages where slight staleness is acceptable.

Read-after-write routing

After a write, route that user’s subsequent reads to the primary for a short window (5–10 seconds). This guarantees the user sees their own changes.

Query-type routing

Route specific queries based on freshness requirements:

Listing pages → replica (slightly stale is fine)
Order status after payment → primary (must be current)
Admin reports → dedicated reporting replica

When replicas help

Read-heavy workloads — APIs serving product pages, user profiles, search results.
Reporting and analytics — heavy queries that would slow down the primary.
Geographic distribution — replicas in different regions reduce latency for distant users.
Failover — a replica can be promoted to primary if the original fails.

When replicas don’t help

Write-heavy workloads — replicas don’t reduce write load on the primary. Every write still goes to one server.
Small databases — if the primary isn’t under pressure, replicas add complexity without benefit.
Strong consistency requirements — if every read must return the latest data, replicas add lag that must be worked around.

The read replica landscape

Database	Replication type	Max replicas	Lag monitoring
PostgreSQL	Streaming (async/sync)	Unlimited	pg_stat_replication
MySQL	Binlog (async/semi-sync)	Unlimited	SHOW SLAVE STATUS
AWS RDS	Managed async	15 per region	CloudWatch
Cloud SQL	Managed async	10	GCP metrics

Common misconception

Adding replicas doesn’t double your database capacity. Replicas only help with reads. If your bottleneck is write throughput, connection limits, or a poorly indexed query, replicas won’t solve it. Always profile your database workload before adding replicas — you might just need a better index.

The one thing to remember: read replicas scale your database’s read capacity by distributing queries across copies, but they introduce replication lag that your Python application must account for in routing decisions.

pythondatabasesscaling