Python Feature Flag Strategies — Core Concepts

Learn the four types of feature flags, when to use each, and how to avoid the tech debt trap of flags that never get cleaned up.

More Than Just On/Off

Feature flags seem simple — if flag: do_thing(). But in practice, teams use them for fundamentally different purposes, and conflating those purposes leads to messy codebases.

The Four Types of Feature Flags

Martin Fowler’s classification is widely adopted:

Type	Purpose	Lifespan	Who Controls
Release flags	Decouple deployment from release	Days to weeks	Developers
Experiment flags	A/B test variations	Weeks	Product/data team
Ops flags	Control operational behavior	Permanent	Operations
Permission flags	Gate access to features	Permanent	Business

Release Flags

The most common type. You merge incomplete features behind a flag, deploy continuously, and flip the flag when the feature is ready. This enables trunk-based development — no long-lived feature branches.

Lifecycle: Created when development starts, removed once the feature is stable and rolled out to 100%.

Experiment Flags

Used for A/B testing. Users are randomly assigned to a variant, and metrics determine which variant wins. These flags need consistent assignment — the same user should always see the same variant.

Lifecycle: Active during the experiment (typically 2-4 weeks), removed once results are analyzed.

Ops Flags

Circuit breakers, kill switches, and performance knobs. “Disable the recommendation engine if it’s too slow.” These are intentionally long-lived because they provide runtime control over system behavior.

Lifecycle: Permanent or semi-permanent. Reviewed periodically.

Permission Flags

“Enterprise customers get this feature.” “Beta users see this.” These gate access based on user attributes rather than random assignment.

Lifecycle: Permanent, tied to business logic.

Evaluation Strategies

How does the system decide if a flag is on for a specific request?

Boolean — simplest. The flag is on or off globally.

Percentage rollout — 10% of users get the feature. Uses consistent hashing so the same user always gets the same result.

User targeting — specific user IDs or email domains get the feature. Useful for internal testing.

Rule-based — combine conditions: “Users in Europe AND on the premium plan AND not on mobile.” Rules evaluate against a context object containing user attributes.

The Cleanup Problem

The biggest risk with feature flags isn’t the flags themselves — it’s flags that never get removed. A codebase with 200 stale flags becomes impossible to reason about. Every code path has invisible branches.

Prevention strategies:

Set an expiration date when creating every release/experiment flag
Track flags in a registry with an owner and review date
Run linting rules that flag (pun intended) old flags
Treat flag removal as part of the feature’s “definition of done”

Common Misconception

“Feature flags replace proper testing.” Flags control who sees what, but the feature still needs to work correctly in both states. A flag that hides a broken feature doesn’t fix it — it just delays the problem. Test both the on and off paths.

One thing to remember: Not all feature flags are the same. Release flags are temporary (clean them up!), while ops flags are permanent. Categorize your flags by type and set lifecycle expectations from day one.

pythonfeature-flagsproduction