Fuzz Testing — Deep Dive
Google’s atheris: coverage-guided fuzzing for Python
Atheris is Google’s coverage-guided fuzzer for Python, built on top of libFuzzer. It instruments CPython bytecode to track coverage and uses evolutionary algorithms to maximize code exploration.
Basic setup:
# fuzz_json_parser.py
import atheris
import sys
import json
def test_one_input(data: bytes):
fdp = atheris.FuzzedDataProvider(data)
try:
json.loads(fdp.ConsumeUnicodeNoSurrogates(256))
except (json.JSONDecodeError, UnicodeDecodeError):
pass # Expected — we only care about unexpected crashes
if __name__ == "__main__":
atheris.Setup(sys.argv, test_one_input)
atheris.Fuzz()
Run with:
pip install atheris
python fuzz_json_parser.py -max_len=1024 -runs=100000
Key flags:
-max_len— limits input size to prevent memory exhaustion-runs— number of iterations (0 = run forever)-dict— provide a dictionary of tokens to guide mutation (useful for structured formats)
The FuzzedDataProvider API
Atheris provides FuzzedDataProvider to extract typed values from raw fuzzer bytes:
def test_one_input(data: bytes):
fdp = atheris.FuzzedDataProvider(data)
name = fdp.ConsumeUnicodeNoSurrogates(64)
age = fdp.ConsumeIntInRange(0, 200)
score = fdp.ConsumeFloat()
is_active = fdp.ConsumeBool()
raw_bytes = fdp.ConsumeBytes(fdp.ConsumeIntInRange(0, 1024))
# Now fuzz your function with structured-ish random data
process_user(name=name, age=age, score=score, active=is_active)
This gives the fuzzer more structure to work with than pure random bytes, leading to faster discovery of interesting code paths.
Hypothesis for structured fuzzing
When your inputs have known schemas, Hypothesis strategies produce more targeted fuzzing:
from hypothesis import given, settings, strategies as st
from hypothesis import HealthCheck
from pydantic import ValidationError
from myapp.models import OrderRequest
order_strategy = st.fixed_dictionaries({
"customer_id": st.text(min_size=0, max_size=100),
"items": st.lists(
st.fixed_dictionaries({
"sku": st.text(min_size=0, max_size=50),
"quantity": st.integers(min_value=-1000, max_value=1000),
"price": st.floats(allow_nan=True, allow_infinity=True),
}),
min_size=0,
max_size=20,
),
"coupon_code": st.one_of(st.none(), st.text(max_size=200)),
"shipping_tier": st.sampled_from(["standard", "express", "overnight", ""]),
})
@given(data=order_strategy)
@settings(
max_examples=5000,
suppress_health_check=[HealthCheck.too_slow],
)
def test_order_validation_never_crashes(data):
"""The validator may reject input but must never raise an unhandled exception."""
try:
OrderRequest(**data)
except ValidationError:
pass # Pydantic rejection is fine
# Any other exception = bug
Corpus management
Coverage-guided fuzzers maintain a corpus — a collection of inputs that achieve unique coverage. Managing this corpus is crucial for effective long-running fuzzing:
# Create corpus directory with seed inputs
mkdir -p corpus/json_parser
echo '{}' > corpus/json_parser/seed_empty
echo '{"key": [1, 2, 3]}' > corpus/json_parser/seed_nested
echo '{"deep": {"nested": {"value": null}}}' > corpus/json_parser/seed_deep
# Run fuzzer with corpus
python fuzz_json_parser.py corpus/json_parser/ -max_len=4096
The fuzzer reads seeds from the corpus, mutates them, and saves any inputs that discover new coverage back to the corpus directory. Over time, the corpus becomes a valuable asset — check it into version control so future fuzzing runs start from accumulated knowledge.
Continuous fuzzing with OSS-Fuzz
For open-source projects, Google’s OSS-Fuzz provides free continuous fuzzing infrastructure. For private projects, ClusterFuzz or custom CI integration achieves similar results:
# .github/workflows/fuzz.yml
name: Continuous Fuzzing
on:
schedule:
- cron: '0 2 * * *' # Nightly
jobs:
fuzz:
runs-on: ubuntu-latest
strategy:
matrix:
target: [json_parser, csv_reader, xml_handler]
steps:
- uses: actions/checkout@v4
- name: Restore corpus
uses: actions/cache@v4
with:
path: corpus/${{ matrix.target }}
key: fuzz-corpus-${{ matrix.target }}-${{ github.sha }}
restore-keys: fuzz-corpus-${{ matrix.target }}-
- run: pip install atheris
- name: Fuzz for 10 minutes
run: |
timeout 600 python fuzz_targets/${{ matrix.target }}.py \
corpus/${{ matrix.target }}/ \
-max_len=4096 \
-print_final_stats=1 || true
- name: Check for crashes
run: |
if ls crash-* 1>/dev/null 2>&1; then
echo "Crashes found!"
for f in crash-*; do
echo "=== $f ==="
xxd "$f" | head -20
done
exit 1
fi
Differential fuzzing
Differential fuzzing compares two implementations against the same input. If they disagree, at least one has a bug:
import atheris
import sys
import json
import orjson
def test_one_input(data: bytes):
fdp = atheris.FuzzedDataProvider(data)
text = fdp.ConsumeUnicodeNoSurrogates(512)
try:
result_json = json.loads(text)
except (json.JSONDecodeError, ValueError):
result_json = "PARSE_ERROR"
try:
result_orjson = orjson.loads(text.encode())
except (orjson.JSONDecodeError, ValueError):
result_orjson = "PARSE_ERROR"
if result_json != "PARSE_ERROR" and result_orjson != "PARSE_ERROR":
assert result_json == result_orjson, (
f"Disagreement on input {text!r}: "
f"json={result_json}, orjson={result_orjson}"
)
if __name__ == "__main__":
atheris.Setup(sys.argv, test_one_input)
atheris.Fuzz()
This technique has found real bugs in production JSON parsers, YAML libraries, and serialization frameworks.
Fuzzing web endpoints
Combine Hypothesis with httpx to fuzz API endpoints:
import httpx
from hypothesis import given, settings, strategies as st
BASE_URL = "http://localhost:8000"
@given(
payload=st.dictionaries(
keys=st.text(max_size=30),
values=st.one_of(
st.none(),
st.integers(),
st.floats(allow_nan=True),
st.text(max_size=200),
st.lists(st.integers(), max_size=10),
),
max_size=15,
)
)
@settings(max_examples=2000)
def test_api_never_500s(payload):
response = httpx.post(f"{BASE_URL}/api/process", json=payload)
assert response.status_code != 500, (
f"Server error on payload: {payload}"
)
Any 500 response means unhandled input — exactly what you want to find before attackers do.
Performance and resource limits
Fuzzing can consume significant resources. Set boundaries:
import resource
import atheris
import sys
def setup_limits():
# Limit memory to 512MB
resource.setrlimit(resource.RLIMIT_AS, (512 * 1024 * 1024, 512 * 1024 * 1024))
def test_one_input(data: bytes):
# Add timeout per input
import signal
signal.alarm(5) # 5 second timeout per input
try:
process(data)
except Exception:
pass
finally:
signal.alarm(0)
if __name__ == "__main__":
setup_limits()
atheris.Setup(sys.argv, test_one_input)
atheris.Fuzz()
Triaging and reproducing findings
When a fuzzer finds a crash, it saves the triggering input as a file. Reproduce and minimize it:
# Reproduce the crash
python fuzz_json_parser.py crash-abc123
# Minimize the crashing input
python fuzz_json_parser.py -minimize_crash=1 crash-abc123
Add the minimized crash as a regression test:
def test_regression_crash_abc123():
"""Regression test for fuzz finding crash-abc123."""
bad_input = b'\x00\xff\x80invalid'
try:
process(bad_input)
except ValueError:
pass # Now handled gracefully
The one thing to remember: Effective fuzz testing combines coverage-guided tools like atheris for raw exploration, Hypothesis for schema-aware property testing, corpus management for accumulated knowledge, and CI integration for continuous discovery.
See Also
- Python Acceptance Testing Patterns How Python teams verify software does what real users actually asked for.
- Python Approval Testing How approval testing lets you verify complex Python output by comparing it to a saved 'golden' copy you already checked.
- Python Behavior Driven Development Get an intuitive feel for Behavior Driven Development so Python behavior stops feeling unpredictable.
- Python Browser Automation Testing How Python can control a web browser like a robot to test websites automatically.
- Python Chaos Testing Applications Why breaking your own Python systems on purpose makes them stronger.