Fuzz Testing — Deep Dive

Build production fuzz testing pipelines with atheris, Hypothesis, and OSS-Fuzz integration for Python codebases.

Google’s atheris: coverage-guided fuzzing for Python

Atheris is Google’s coverage-guided fuzzer for Python, built on top of libFuzzer. It instruments CPython bytecode to track coverage and uses evolutionary algorithms to maximize code exploration.

Basic setup:

# fuzz_json_parser.py
import atheris
import sys
import json


def test_one_input(data: bytes):
    fdp = atheris.FuzzedDataProvider(data)
    try:
        json.loads(fdp.ConsumeUnicodeNoSurrogates(256))
    except (json.JSONDecodeError, UnicodeDecodeError):
        pass  # Expected — we only care about unexpected crashes


if __name__ == "__main__":
    atheris.Setup(sys.argv, test_one_input)
    atheris.Fuzz()

Run with:

pip install atheris
python fuzz_json_parser.py -max_len=1024 -runs=100000

Key flags:

-max_len — limits input size to prevent memory exhaustion
-runs — number of iterations (0 = run forever)
-dict — provide a dictionary of tokens to guide mutation (useful for structured formats)

The FuzzedDataProvider API

Atheris provides FuzzedDataProvider to extract typed values from raw fuzzer bytes:

def test_one_input(data: bytes):
    fdp = atheris.FuzzedDataProvider(data)

    name = fdp.ConsumeUnicodeNoSurrogates(64)
    age = fdp.ConsumeIntInRange(0, 200)
    score = fdp.ConsumeFloat()
    is_active = fdp.ConsumeBool()
    raw_bytes = fdp.ConsumeBytes(fdp.ConsumeIntInRange(0, 1024))

    # Now fuzz your function with structured-ish random data
    process_user(name=name, age=age, score=score, active=is_active)

This gives the fuzzer more structure to work with than pure random bytes, leading to faster discovery of interesting code paths.

Hypothesis for structured fuzzing

When your inputs have known schemas, Hypothesis strategies produce more targeted fuzzing:

from hypothesis import given, settings, strategies as st
from hypothesis import HealthCheck
from pydantic import ValidationError
from myapp.models import OrderRequest


order_strategy = st.fixed_dictionaries({
    "customer_id": st.text(min_size=0, max_size=100),
    "items": st.lists(
        st.fixed_dictionaries({
            "sku": st.text(min_size=0, max_size=50),
            "quantity": st.integers(min_value=-1000, max_value=1000),
            "price": st.floats(allow_nan=True, allow_infinity=True),
        }),
        min_size=0,
        max_size=20,
    ),
    "coupon_code": st.one_of(st.none(), st.text(max_size=200)),
    "shipping_tier": st.sampled_from(["standard", "express", "overnight", ""]),
})


@given(data=order_strategy)
@settings(
    max_examples=5000,
    suppress_health_check=[HealthCheck.too_slow],
)
def test_order_validation_never_crashes(data):
    """The validator may reject input but must never raise an unhandled exception."""
    try:
        OrderRequest(**data)
    except ValidationError:
        pass  # Pydantic rejection is fine
    # Any other exception = bug

Corpus management

Coverage-guided fuzzers maintain a corpus — a collection of inputs that achieve unique coverage. Managing this corpus is crucial for effective long-running fuzzing:

# Create corpus directory with seed inputs
mkdir -p corpus/json_parser
echo '{}' > corpus/json_parser/seed_empty
echo '{"key": [1, 2, 3]}' > corpus/json_parser/seed_nested
echo '{"deep": {"nested": {"value": null}}}' > corpus/json_parser/seed_deep

# Run fuzzer with corpus
python fuzz_json_parser.py corpus/json_parser/ -max_len=4096

The fuzzer reads seeds from the corpus, mutates them, and saves any inputs that discover new coverage back to the corpus directory. Over time, the corpus becomes a valuable asset — check it into version control so future fuzzing runs start from accumulated knowledge.

Continuous fuzzing with OSS-Fuzz

For open-source projects, Google’s OSS-Fuzz provides free continuous fuzzing infrastructure. For private projects, ClusterFuzz or custom CI integration achieves similar results:

# .github/workflows/fuzz.yml
name: Continuous Fuzzing
on:
  schedule:
    - cron: '0 2 * * *'  # Nightly

jobs:
  fuzz:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        target: [json_parser, csv_reader, xml_handler]
    steps:
      - uses: actions/checkout@v4

      - name: Restore corpus
        uses: actions/cache@v4
        with:
          path: corpus/${{ matrix.target }}
          key: fuzz-corpus-${{ matrix.target }}-${{ github.sha }}
          restore-keys: fuzz-corpus-${{ matrix.target }}-

      - run: pip install atheris
      - name: Fuzz for 10 minutes
        run: |
          timeout 600 python fuzz_targets/${{ matrix.target }}.py \
            corpus/${{ matrix.target }}/ \
            -max_len=4096 \
            -print_final_stats=1 || true

      - name: Check for crashes
        run: |
          if ls crash-* 1>/dev/null 2>&1; then
            echo "Crashes found!"
            for f in crash-*; do
              echo "=== $f ==="
              xxd "$f" | head -20
            done
            exit 1
          fi

Differential fuzzing

Differential fuzzing compares two implementations against the same input. If they disagree, at least one has a bug:

import atheris
import sys
import json
import orjson


def test_one_input(data: bytes):
    fdp = atheris.FuzzedDataProvider(data)
    text = fdp.ConsumeUnicodeNoSurrogates(512)

    try:
        result_json = json.loads(text)
    except (json.JSONDecodeError, ValueError):
        result_json = "PARSE_ERROR"

    try:
        result_orjson = orjson.loads(text.encode())
    except (orjson.JSONDecodeError, ValueError):
        result_orjson = "PARSE_ERROR"

    if result_json != "PARSE_ERROR" and result_orjson != "PARSE_ERROR":
        assert result_json == result_orjson, (
            f"Disagreement on input {text!r}: "
            f"json={result_json}, orjson={result_orjson}"
        )


if __name__ == "__main__":
    atheris.Setup(sys.argv, test_one_input)
    atheris.Fuzz()

This technique has found real bugs in production JSON parsers, YAML libraries, and serialization frameworks.

Fuzzing web endpoints

Combine Hypothesis with httpx to fuzz API endpoints:

import httpx
from hypothesis import given, settings, strategies as st

BASE_URL = "http://localhost:8000"


@given(
    payload=st.dictionaries(
        keys=st.text(max_size=30),
        values=st.one_of(
            st.none(),
            st.integers(),
            st.floats(allow_nan=True),
            st.text(max_size=200),
            st.lists(st.integers(), max_size=10),
        ),
        max_size=15,
    )
)
@settings(max_examples=2000)
def test_api_never_500s(payload):
    response = httpx.post(f"{BASE_URL}/api/process", json=payload)
    assert response.status_code != 500, (
        f"Server error on payload: {payload}"
    )

Any 500 response means unhandled input — exactly what you want to find before attackers do.

Performance and resource limits

Fuzzing can consume significant resources. Set boundaries:

import resource
import atheris
import sys


def setup_limits():
    # Limit memory to 512MB
    resource.setrlimit(resource.RLIMIT_AS, (512 * 1024 * 1024, 512 * 1024 * 1024))


def test_one_input(data: bytes):
    # Add timeout per input
    import signal
    signal.alarm(5)  # 5 second timeout per input
    try:
        process(data)
    except Exception:
        pass
    finally:
        signal.alarm(0)


if __name__ == "__main__":
    setup_limits()
    atheris.Setup(sys.argv, test_one_input)
    atheris.Fuzz()

Triaging and reproducing findings

When a fuzzer finds a crash, it saves the triggering input as a file. Reproduce and minimize it:

# Reproduce the crash
python fuzz_json_parser.py crash-abc123

# Minimize the crashing input
python fuzz_json_parser.py -minimize_crash=1 crash-abc123

Add the minimized crash as a regression test:

def test_regression_crash_abc123():
    """Regression test for fuzz finding crash-abc123."""
    bad_input = b'\x00\xff\x80invalid'
    try:
        process(bad_input)
    except ValueError:
        pass  # Now handled gracefully

The one thing to remember: Effective fuzz testing combines coverage-guided tools like atheris for raw exploration, Hypothesis for schema-aware property testing, corpus management for accumulated knowledge, and CI integration for continuous discovery.

pythontestingsecurity