Python Feature Flag Strategies — Core Concepts
More Than Just On/Off
Feature flags seem simple — if flag: do_thing(). But in practice, teams use them for fundamentally different purposes, and conflating those purposes leads to messy codebases.
The Four Types of Feature Flags
Martin Fowler’s classification is widely adopted:
| Type | Purpose | Lifespan | Who Controls |
|---|---|---|---|
| Release flags | Decouple deployment from release | Days to weeks | Developers |
| Experiment flags | A/B test variations | Weeks | Product/data team |
| Ops flags | Control operational behavior | Permanent | Operations |
| Permission flags | Gate access to features | Permanent | Business |
Release Flags
The most common type. You merge incomplete features behind a flag, deploy continuously, and flip the flag when the feature is ready. This enables trunk-based development — no long-lived feature branches.
Lifecycle: Created when development starts, removed once the feature is stable and rolled out to 100%.
Experiment Flags
Used for A/B testing. Users are randomly assigned to a variant, and metrics determine which variant wins. These flags need consistent assignment — the same user should always see the same variant.
Lifecycle: Active during the experiment (typically 2-4 weeks), removed once results are analyzed.
Ops Flags
Circuit breakers, kill switches, and performance knobs. “Disable the recommendation engine if it’s too slow.” These are intentionally long-lived because they provide runtime control over system behavior.
Lifecycle: Permanent or semi-permanent. Reviewed periodically.
Permission Flags
“Enterprise customers get this feature.” “Beta users see this.” These gate access based on user attributes rather than random assignment.
Lifecycle: Permanent, tied to business logic.
Evaluation Strategies
How does the system decide if a flag is on for a specific request?
Boolean — simplest. The flag is on or off globally.
Percentage rollout — 10% of users get the feature. Uses consistent hashing so the same user always gets the same result.
User targeting — specific user IDs or email domains get the feature. Useful for internal testing.
Rule-based — combine conditions: “Users in Europe AND on the premium plan AND not on mobile.” Rules evaluate against a context object containing user attributes.
The Cleanup Problem
The biggest risk with feature flags isn’t the flags themselves — it’s flags that never get removed. A codebase with 200 stale flags becomes impossible to reason about. Every code path has invisible branches.
Prevention strategies:
- Set an expiration date when creating every release/experiment flag
- Track flags in a registry with an owner and review date
- Run linting rules that flag (pun intended) old flags
- Treat flag removal as part of the feature’s “definition of done”
Common Misconception
“Feature flags replace proper testing.” Flags control who sees what, but the feature still needs to work correctly in both states. A flag that hides a broken feature doesn’t fix it — it just delays the problem. Test both the on and off paths.
One thing to remember: Not all feature flags are the same. Release flags are temporary (clean them up!), while ops flags are permanent. Categorize your flags by type and set lifecycle expectations from day one.
See Also
- Python Ab Testing Framework How tech companies test two versions of something to see which one wins — explained with a lemonade stand experiment.
- Python Configuration Hierarchy How your Python app decides which settings to use — explained like layers of clothing on a cold day.
- Python Graceful Shutdown Why your Python app needs to say goodbye properly before it stops — explained with a restaurant closing analogy.
- Python Health Check Patterns Why your Python app needs regular check-ups — explained like a doctor's visit for software.
- Python Readiness Liveness Probes The two questions every cloud platform asks your Python app — explained with a school attendance analogy.