Python API Load Testing — Core Concepts

Why load testing matters

Your API passes all unit tests, integration tests, and manual QA. Then you launch, 500 users show up, and everything collapses. Load testing prevents this by revealing performance limits before production traffic does.

Types of load tests

Smoke test — A minimal load (5–10 users) to verify the system works under any load at all. Catches configuration problems.

Load test — Normal expected traffic (say 200 concurrent users) sustained for 10–30 minutes. Verifies the system handles daily volume.

Stress test — Traffic beyond expected peaks (2x–5x normal). Reveals the breaking point and how the system degrades.

Spike test — Sudden traffic burst (0 to 1000 users in seconds). Tests auto-scaling and queue handling.

Soak test — Normal load sustained for hours. Catches memory leaks, connection pool exhaustion, and gradual degradation.

Locust: Python-native load testing

Locust defines user behavior in Python classes:

from locust import HttpUser, task, between

class WebUser(HttpUser):
    wait_time = between(1, 3)  # 1-3 seconds between actions
    
    def on_start(self):
        """Called when a simulated user starts."""
        response = self.client.post("/auth/login", json={
            "email": "testuser@example.com",
            "password": "testpass123",
        })
        self.token = response.json()["access_token"]
        self.client.headers.update({"Authorization": f"Bearer {self.token}"})
    
    @task(3)  # 3x more likely than other tasks
    def browse_products(self):
        self.client.get("/products?page=1&limit=20")
    
    @task(2)
    def view_product(self):
        self.client.get("/products/42")
    
    @task(1)
    def create_order(self):
        self.client.post("/orders", json={
            "product_id": 42,
            "quantity": 1,
        })

The @task weights create realistic traffic distribution: browsing is three times more common than ordering.

Run it: locust -f locustfile.py --host http://localhost:8000

Locust provides a web dashboard showing request rates, response times, and failure percentages in real time.

Key metrics to watch

Response time percentiles — P50 (median) shows typical experience. P95 shows the experience of your slowest users. P99 catches outlier issues. If P50 is 50ms but P99 is 5 seconds, most users are fine but some are having a terrible time.

Requests per second (RPS) — The throughput your API sustains. Watch for the point where RPS plateaus despite adding more users — that is your capacity ceiling.

Error rate — The percentage of requests that fail (5xx responses). A healthy system keeps this under 0.1% under normal load.

Resource utilization — CPU, memory, database connections, and disk I/O on the server during the test. High CPU means compute-bound work. High memory might indicate a leak. Connection pool exhaustion points to database bottlenecks.

Interpreting results

A typical load test story:

  1. 0–100 users: Response times flat at 50ms. System is comfortable.
  2. 100–300 users: Response times creep to 150ms. CPU at 60%. Still acceptable.
  3. 300–500 users: Response times jump to 800ms. Database connections maxed out. Errors start appearing.
  4. 500+ users: Response times exceed 5 seconds. Error rate hits 30%. The system is overwhelmed.

The inflection point (300 users in this example) is where you focus optimization efforts. Add database connection pooling, introduce caching, or scale horizontally.

Common mistakes

Testing against production — Load tests should hit a staging environment that mirrors production. Testing against production risks affecting real users.

Using unrealistic data — If all simulated users request the same product, cache hit rates will be artificially high. Use varied, realistic data.

Ignoring warm-up — The first few requests are always slower (JIT compilation, connection pool initialization, empty caches). Ramp up gradually and ignore the first minute.

Testing only happy paths — Real traffic includes 404s, validation errors, and retries. Include error scenarios in your test scripts.

Beyond Locust: k6 and alternatives

k6 (by Grafana Labs) uses JavaScript for test scripts but integrates well with Python APIs. It handles higher loads per machine and outputs metrics directly to Grafana.

Artillery is a YAML-based tool for quick load tests without writing code.

wrk and hey are lightweight HTTP benchmarking tools for simple endpoint testing.

Locust is the best choice when you need complex user behavior written in Python. k6 is better for raw throughput testing and Grafana integration.

Common misconception

Load testing is not a one-time activity. Every significant code change, infrastructure change, or traffic pattern shift can alter performance characteristics. The most effective teams run load tests in CI/CD pipelines, comparing results against baseline benchmarks automatically.

The one thing to remember: Load test early and often using realistic traffic patterns, watch percentile latencies (not averages), identify the inflection point where performance degrades, and fix bottlenecks before users find them.

pythonapiload-testingperformancelocust

See Also