Test Data Factories — Core Concepts
Why factories beat fixtures
Traditional test data approaches have real problems:
Hardcoded fixtures — JSON files or SQL dumps with static data. They’re brittle (change one field and 50 tests break), hard to customize per test, and grow into massive files nobody understands.
Manual construction — Building objects in each test function. Leads to duplicated setup code, verbose tests, and inconsistency when one test sets a field differently than another.
Production database copies — Using sanitized production data. Contains too much irrelevant data, can leak PII despite sanitization, and tests become environment-dependent.
Factories solve all three problems. They provide a single source of truth for how to create test data, with sensible defaults that individual tests can override.
factory_boy: Python’s standard factory library
factory_boy is the most widely used factory library in Python. It integrates with Django, SQLAlchemy, and plain dataclasses:
import factory
from myapp.models import User, Order, Product
class UserFactory(factory.Factory):
class Meta:
model = User
name = factory.Faker("name")
email = factory.LazyAttribute(lambda obj: f"{obj.name.lower().replace(' ', '.')}@example.com")
role = "customer"
created_at = factory.Faker("date_time_this_year")
class ProductFactory(factory.Factory):
class Meta:
model = Product
name = factory.Faker("catch_phrase")
price = factory.Faker("pydecimal", left_digits=3, right_digits=2, positive=True)
sku = factory.Sequence(lambda n: f"PROD-{n:06d}")
in_stock = True
Usage is clean:
# Default user
user = UserFactory()
# Customized user
admin = UserFactory(role="admin", name="Admin User")
# Batch creation
customers = UserFactory.create_batch(50)
Key factory_boy features
Sequences generate unique values that increment with each factory call. Perfect for IDs, SKUs, and usernames:
username = factory.Sequence(lambda n: f"user_{n}")
LazyAttribute computes a value based on other fields:
email = factory.LazyAttribute(
lambda obj: f"{obj.username}@company.com"
)
SubFactory creates related objects automatically:
class OrderFactory(factory.Factory):
class Meta:
model = Order
customer = factory.SubFactory(UserFactory)
product = factory.SubFactory(ProductFactory)
quantity = factory.Faker("random_int", min=1, max=10)
Calling OrderFactory() creates a User, a Product, and an Order all in one call. Each test gets its own isolated data graph.
Traits define named presets for common variations:
class UserFactory(factory.Factory):
class Meta:
model = User
name = factory.Faker("name")
is_active = True
subscription_tier = "free"
class Params:
premium = factory.Trait(
subscription_tier="premium",
payment_method="card_visa",
)
deactivated = factory.Trait(
is_active=False,
deactivated_at=factory.Faker("date_time_this_month"),
)
Usage: UserFactory(premium=True) or UserFactory(deactivated=True).
Faker: the data generation engine
factory_boy uses Faker under the hood for generating realistic values. Faker provides hundreds of data types:
from faker import Faker
fake = Faker()
fake.name() # "Jennifer Martinez"
fake.email() # "david.wilson@yahoo.com"
fake.address() # "123 Elm Street, Springfield, IL 62704"
fake.credit_card_number() # "4111111111111111"
fake.date_this_year() # datetime.date(2026, 2, 14)
fake.paragraph() # Realistic lorem-style text
Faker supports locales for international data: Faker("ja_JP") generates Japanese names and addresses. This is important for testing internationalization.
Deterministic data with seeds
For reproducible tests, seed the random generator:
import factory
factory.random.reseed_random(12345)
user1 = UserFactory() # Always the same "random" user
user2 = UserFactory() # Always the same second user
This makes tests deterministic — the same seed always produces the same data, which helps with debugging flaky tests.
Common misconception
Factories don’t replace all test fixtures. They excel at creating domain objects (users, orders, products). For configuration data, API response mocks, or complex nested JSON structures, dedicated fixtures or conftest files are often clearer. Use factories for entities that have identity and relationships; use fixtures for everything else.
The one thing to remember: Test data factories provide a single source of truth for creating test objects with sensible defaults, automatic relationships, and per-test customization — making test setup a single readable line instead of a wall of boilerplate.
See Also
- Python Acceptance Testing Patterns How Python teams verify software does what real users actually asked for.
- Python Approval Testing How approval testing lets you verify complex Python output by comparing it to a saved 'golden' copy you already checked.
- Python Behavior Driven Development Get an intuitive feel for Behavior Driven Development so Python behavior stops feeling unpredictable.
- Python Browser Automation Testing How Python can control a web browser like a robot to test websites automatically.
- Python Chaos Testing Applications Why breaking your own Python systems on purpose makes them stronger.