Python Load Testing with Locust — Deep Dive
Building a realistic load test
A production-quality Locust test models actual user behavior, not just endpoint hammering:
# locustfile.py
from locust import HttpUser, task, between, tag
class WebsiteUser(HttpUser):
"""Simulates a typical e-commerce browsing session."""
wait_time = between(1, 5) # 1-5 seconds between actions
def on_start(self):
"""Login when the user starts."""
response = self.client.post("/api/auth/login", json={
"email": "loadtest@example.com",
"password": "test-password-123",
})
self.token = response.json().get("token", "")
self.client.headers.update({"Authorization": f"Bearer {self.token}"})
@tag("browse")
@task(5) # 5x more likely than other tasks
def browse_products(self):
self.client.get("/api/products?page=1&limit=20")
@tag("browse")
@task(3)
def view_product_detail(self):
product_id = self._random_product_id()
self.client.get(f"/api/products/{product_id}")
@tag("purchase")
@task(1)
def add_to_cart(self):
product_id = self._random_product_id()
self.client.post("/api/cart/items", json={
"product_id": product_id,
"quantity": 1,
})
@tag("purchase")
@task(1)
def checkout(self):
with self.client.post(
"/api/orders",
json={"payment_method": "test"},
catch_response=True
) as response:
if response.status_code == 201:
response.success()
elif response.status_code == 409:
response.failure("Cart was empty")
else:
response.failure(f"Unexpected: {response.status_code}")
def _random_product_id(self) -> int:
import random
return random.randint(1, 1000)
Task weights model real behavior: users browse much more than they buy. The @tag decorator lets you run subsets of tasks (locust --tags browse for browse-only tests).
Custom user types for mixed workloads
Real applications serve different types of users with different behaviors:
from locust import HttpUser, task, between, constant
class APIConsumer(HttpUser):
"""Simulates backend-to-backend API calls."""
wait_time = constant(0.1) # API clients are fast
weight = 3 # 3x more API consumers than admin users
@task
def fetch_data(self):
self.client.get("/api/v1/data/export",
headers={"X-API-Key": "test-key"})
class AdminUser(HttpUser):
"""Simulates admin dashboard usage."""
wait_time = between(3, 10) # Admins read dashboards slowly
weight = 1
@task(3)
def view_dashboard(self):
self.client.get("/admin/dashboard")
@task(1)
def generate_report(self):
self.client.post("/admin/reports/generate", json={
"type": "monthly",
"format": "csv",
})
class MobileUser(HttpUser):
"""Simulates mobile app API calls."""
wait_time = between(2, 8)
weight = 6 # Most traffic comes from mobile
def on_start(self):
self.client.headers.update({
"User-Agent": "MyApp/2.1 (iOS 17.4)",
"Accept": "application/json",
})
@task
def sync_feed(self):
self.client.get("/api/mobile/feed?since=2026-03-01")
The weight parameter controls the ratio of user types. With weights 3:1:6, a 1000-user test runs 300 API consumers, 100 admin users, and 600 mobile users.
Distributed execution architecture
For large-scale tests, Locust runs in master-worker mode:
# Master node (coordinates workers, serves web UI)
locust --master --expect-workers=4
# Worker nodes (generate actual load)
locust --worker --master-host=master-ip
locust --worker --master-host=master-ip
locust --worker --master-host=master-ip
locust --worker --master-host=master-ip
For cloud deployment, containerize the workers:
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY locustfile.py .
CMD ["locust", "--worker", "--master-host", "locust-master"]
# docker-compose.yml for local distributed testing
services:
master:
build: .
command: locust --master --expect-workers=4
ports:
- "8089:8089"
worker:
build: .
command: locust --worker --master-host=master
deploy:
replicas: 4
Each worker can handle approximately 5,000-10,000 simulated users depending on test complexity and hardware. Four workers on modest machines can simulate 20,000-40,000 concurrent users.
Headless execution and CI integration
For CI pipelines, run Locust without the web UI:
locust --headless \
--users 500 \
--spawn-rate 50 \
--run-time 5m \
--host https://staging.example.com \
--csv results/loadtest \
--html results/report.html
Parse the CSV output for automated pass/fail decisions:
# scripts/check_load_results.py
import csv
import sys
def check_results(csv_path: str) -> bool:
"""Fail CI if performance thresholds are exceeded."""
thresholds = {
"p95_response_time": 2000, # ms
"failure_rate": 0.01, # 1%
"avg_response_time": 500, # ms
}
with open(f"{csv_path}_stats.csv") as f:
reader = csv.DictReader(f)
for row in reader:
if row["Name"] == "Aggregated":
p95 = float(row["95%"])
avg = float(row["Average Response Time"])
failures = int(row["Failure Count"])
total = int(row["Request Count"])
failure_rate = failures / total if total > 0 else 0
if p95 > thresholds["p95_response_time"]:
print(f"FAIL: p95 response time {p95}ms > {thresholds['p95_response_time']}ms")
return False
if failure_rate > thresholds["failure_rate"]:
print(f"FAIL: failure rate {failure_rate:.2%} > {thresholds['failure_rate']:.0%}")
return False
if avg > thresholds["avg_response_time"]:
print(f"FAIL: avg response time {avg}ms > {thresholds['avg_response_time']}ms")
return False
print("PASS: All performance thresholds met")
return True
if __name__ == "__main__":
sys.exit(0 if check_results(sys.argv[1]) else 1)
Custom event hooks for advanced monitoring
Locust provides event hooks for custom metrics and integrations:
from locust import events
import time
@events.request.add_listener
def on_request(request_type, name, response_time, response_length,
response, exception, context, **kwargs):
"""Send metrics to external monitoring."""
if response_time > 5000:
print(f"SLOW REQUEST: {name} took {response_time}ms")
if exception:
print(f"FAILED: {name} - {exception}")
@events.test_start.add_listener
def on_test_start(environment, **kwargs):
print(f"Load test starting against {environment.host}")
print(f"Target: {environment.runner.target_user_count} users")
@events.test_stop.add_listener
def on_test_stop(environment, **kwargs):
stats = environment.runner.stats
total = stats.total
print(f"Test complete: {total.num_requests} requests, "
f"{total.num_failures} failures, "
f"avg {total.avg_response_time:.0f}ms")
Performance analysis patterns
After running a load test, look for these patterns:
Linear degradation: Response times increase proportionally with users. This usually indicates CPU-bound processing — each request takes fixed time and requests queue up.
Cliff effect: Performance is fine up to N users, then suddenly collapses. This typically means a resource limit was hit — database connection pool exhausted, memory filled, or thread pool saturated.
Sawtooth pattern: Response times spike periodically then recover. Often caused by garbage collection pauses, cache expiration/rebuilds, or background job interference.
Flat then spike: Performance is constant regardless of load until a specific endpoint is hit. That endpoint is the bottleneck — often a database query missing an index, an N+1 query, or an unbounded data fetch.
Each pattern points to a different class of fix. Linear degradation benefits from caching or query optimization. Cliff effects need resource pool tuning. Sawtooth needs GC tuning or cache warming. Flat-then-spike needs endpoint-specific optimization.
One thing to remember: The most valuable load test result isn’t a number — it’s the story of how your system degrades. Understanding whether it degrades gracefully (slows down) or catastrophically (crashes) determines your production reliability.
See Also
- Python Acceptance Testing Patterns How Python teams verify software does what real users actually asked for.
- Python Approval Testing How approval testing lets you verify complex Python output by comparing it to a saved 'golden' copy you already checked.
- Python Behavior Driven Development Get an intuitive feel for Behavior Driven Development so Python behavior stops feeling unpredictable.
- Python Browser Automation Testing How Python can control a web browser like a robot to test websites automatically.
- Python Chaos Testing Applications Why breaking your own Python systems on purpose makes them stronger.