Test Data Factories — Core Concepts

Master factory_boy and Faker patterns to generate consistent, realistic test data for Python applications.

Why factories beat fixtures

Traditional test data approaches have real problems:

Hardcoded fixtures — JSON files or SQL dumps with static data. They’re brittle (change one field and 50 tests break), hard to customize per test, and grow into massive files nobody understands.

Manual construction — Building objects in each test function. Leads to duplicated setup code, verbose tests, and inconsistency when one test sets a field differently than another.

Production database copies — Using sanitized production data. Contains too much irrelevant data, can leak PII despite sanitization, and tests become environment-dependent.

Factories solve all three problems. They provide a single source of truth for how to create test data, with sensible defaults that individual tests can override.

factory_boy: Python’s standard factory library

factory_boy is the most widely used factory library in Python. It integrates with Django, SQLAlchemy, and plain dataclasses:

import factory
from myapp.models import User, Order, Product


class UserFactory(factory.Factory):
    class Meta:
        model = User

    name = factory.Faker("name")
    email = factory.LazyAttribute(lambda obj: f"{obj.name.lower().replace(' ', '.')}@example.com")
    role = "customer"
    created_at = factory.Faker("date_time_this_year")


class ProductFactory(factory.Factory):
    class Meta:
        model = Product

    name = factory.Faker("catch_phrase")
    price = factory.Faker("pydecimal", left_digits=3, right_digits=2, positive=True)
    sku = factory.Sequence(lambda n: f"PROD-{n:06d}")
    in_stock = True

Usage is clean:

# Default user
user = UserFactory()

# Customized user
admin = UserFactory(role="admin", name="Admin User")

# Batch creation
customers = UserFactory.create_batch(50)

Key factory_boy features

Sequences generate unique values that increment with each factory call. Perfect for IDs, SKUs, and usernames:

username = factory.Sequence(lambda n: f"user_{n}")

LazyAttribute computes a value based on other fields:

email = factory.LazyAttribute(
    lambda obj: f"{obj.username}@company.com"
)

SubFactory creates related objects automatically:

class OrderFactory(factory.Factory):
    class Meta:
        model = Order

    customer = factory.SubFactory(UserFactory)
    product = factory.SubFactory(ProductFactory)
    quantity = factory.Faker("random_int", min=1, max=10)

Calling OrderFactory() creates a User, a Product, and an Order all in one call. Each test gets its own isolated data graph.

Traits define named presets for common variations:

class UserFactory(factory.Factory):
    class Meta:
        model = User

    name = factory.Faker("name")
    is_active = True
    subscription_tier = "free"

    class Params:
        premium = factory.Trait(
            subscription_tier="premium",
            payment_method="card_visa",
        )
        deactivated = factory.Trait(
            is_active=False,
            deactivated_at=factory.Faker("date_time_this_month"),
        )

Usage: UserFactory(premium=True) or UserFactory(deactivated=True).

Faker: the data generation engine

factory_boy uses Faker under the hood for generating realistic values. Faker provides hundreds of data types:

from faker import Faker
fake = Faker()

fake.name()           # "Jennifer Martinez"
fake.email()          # "david.wilson@yahoo.com"
fake.address()        # "123 Elm Street, Springfield, IL 62704"
fake.credit_card_number()  # "4111111111111111"
fake.date_this_year()      # datetime.date(2026, 2, 14)
fake.paragraph()           # Realistic lorem-style text

Faker supports locales for international data: Faker("ja_JP") generates Japanese names and addresses. This is important for testing internationalization.

Deterministic data with seeds

For reproducible tests, seed the random generator:

import factory

factory.random.reseed_random(12345)

user1 = UserFactory()  # Always the same "random" user
user2 = UserFactory()  # Always the same second user

This makes tests deterministic — the same seed always produces the same data, which helps with debugging flaky tests.

Common misconception

Factories don’t replace all test fixtures. They excel at creating domain objects (users, orders, products). For configuration data, API response mocks, or complex nested JSON structures, dedicated fixtures or conftest files are often clearer. Use factories for entities that have identity and relationships; use fixtures for everything else.

The one thing to remember: Test data factories provide a single source of truth for creating test objects with sensible defaults, automatic relationships, and per-test customization — making test setup a single readable line instead of a wall of boilerplate.

pythontestingquality