Published on

Feature Flag Chaos — When Your Configuration Becomes Unmanageable

Authors

Introduction

Feature flags are one of the most powerful tools in modern software development — they let you decouple deployment from release, do gradual rollouts, and kill features without a deploy. They also accumulate like technical debt if you don't actively manage them. A codebase with 200 unmanaged flags is harder to reason about than one with no flags at all.

The Feature Flag Lifecycle

Every flag should go through a defined lifecycle — or it accumulates forever:

1. CREATEDFlag added for gradual rollout or A/B test
2. RAMPINGGradually rolled out (10%50%100%)
3. PERMANENTRolled out to 100% of users
4. CLEANUPCode path cleaned up, flag removed from codebase
5. ARCHIVEDFlag deleted from the flag system

Most teams never do steps 4 and 5.
The flag stays in the code as dead if/else branches forever.

Fix 1: Flag Conventions and Ownership

// Every flag has an owner and a scheduled cleanup date
type FeatureFlag = {
  key: string
  description: string
  owner: string       // team or engineer responsible
  type: 'release' | 'experiment' | 'ops' | 'permission'
  cleanupBy: string   // ISO date — flag MUST be cleaned up by this date
  defaultValue: boolean
}

// flags registry (checked into version control)
const flags: FeatureFlag[] = [
  {
    key: 'new-checkout-flow',
    description: 'New checkout flow with guest support',
    owner: 'checkout-team',
    type: 'release',
    cleanupBy: '2026-04-15',  // 4 weeks from creation
    defaultValue: false,
  },
  {
    key: 'new-pricing-algorithm',
    description: 'ML-based dynamic pricing',
    owner: 'pricing-team',
    type: 'experiment',
    cleanupBy: '2026-05-01',
    defaultValue: false,
  },
]

// CI check: fail the build if any flag is past its cleanupBy date
function checkFlagExpiry() {
  const today = new Date().toISOString().split('T')[0]
  const expired = flags.filter(f => f.cleanupBy < today)
  if (expired.length > 0) {
    console.error('EXPIRED FLAGS — clean these up before merging:')
    expired.forEach(f => console.error(`  ${f.key} (owner: ${f.owner}, expired: ${f.cleanupBy})`))
    process.exit(1)
  }
}

Fix 2: Clean Up After Rollout

// ❌ Flag code left forever after rollout
async function processCheckout(cart: Cart) {
  if (await flags.isEnabled('new-checkout-flow', userId)) {
    return newCheckoutService.process(cart)
  } else {
    return oldCheckoutService.process(cart)
  }
}

// ✅ Flag fully enabled → remove the flag and old code path
// After cleanupBy date, engineer removes this:
async function processCheckout(cart: Cart) {
  return newCheckoutService.process(cart)
  // Old code path deleted — no more dead branches
}
// And removes 'new-checkout-flow' from the flags service entirely

Fix 3: Typed Flag SDK to Avoid Typos

// ❌ String-based flags — typos silently default to false
if (await flags.isEnabled('new-checout-flow', userId)) {  // typo! "checout" not "checkout"
  // This never runs because the flag key doesn't match — no error
}

// ✅ Typed flag SDK — TypeScript catches typos at compile time
import { createFlagClient } from './flags'

const flags = createFlagClient<{
  'new-checkout-flow': boolean
  'new-pricing-algorithm': boolean
  'beta-search': boolean
}>()

// TypeScript error if you mistype the key
if (await flags.get('new-chekout-flow', userId)) {  // TS error: key not in type

Fix 4: Flag Evaluation Observability

// Log flag evaluations to track actual usage
// This tells you which flags are actively changing behavior (vs dead code)
class FlagClient {
  async isEnabled(flagKey: string, userId: string): Promise<boolean> {
    const result = await this.evaluate(flagKey, userId)

    // Track flag evaluations — surfaces unused flags
    metrics.increment('flag.evaluated', {
      flag: flagKey,
      result: String(result),
      userId,
    })

    return result
  }
}

// Query: which flags have had 0 evaluations in the last 30 days?
// Those are safe to delete.

Feature Flag Checklist

  • ✅ Every flag has a declared owner and cleanup date
  • ✅ CI pipeline fails if any flag is past its cleanup date
  • ✅ Flags are typed — mistyped flag keys are compile errors
  • ✅ Flag evaluations are tracked — find unused flags
  • ✅ After rollout: remove the old code path AND the flag
  • ✅ Flag count monitored — alert when total exceeds 50 active flags
  • ✅ Regular "flag cleanup" sprint tasks alongside feature work

Conclusion

Feature flags rot like any other code if you don't actively manage their lifecycle. The fix is to treat every flag like a ticket with a due date: create it with an owner and a cleanup deadline, alert when the deadline passes, and delete both the flag and the dead code path after rollout completes. A codebase with 20 well-managed flags is more maintainable than one with 200 flags nobody owns. Keep the count low, the owners clear, and the cleanup dates enforced.