- Published on
Circuit Breaker Not Triggering — When Your Safety Net Has Holes
- Authors

- Name
- Sanjeev Sharma
- @webcoderspeed1
Introduction
You implemented a circuit breaker. The downstream payment service is failing at 80%. But the circuit is still CLOSED — your service keeps hammering the failing dependency. The "safety net" is doing nothing.
Circuit breakers are one of the most misconfigured reliability patterns in distributed systems.
- Why Circuit Breakers Fail to Open
- Problem 1: Threshold Too High
- Problem 2: Not Counting Timeouts as Failures
- Problem 3: Per-Instance State (Not Shared)
- Problem 4: Different Error Types Not Classified
- Fix: Production-Ready Circuit Breaker
- Monitoring Circuit Breaker State
- Circuit Breaker Configuration Guide
- Conclusion
Why Circuit Breakers Fail to Open
Problem 1: Threshold Too High
// ❌ Circuit opens only after 50 failures — too slow
const breaker = new CircuitBreaker(fn, {
errorThresholdPercentage: 50,
volumeThreshold: 50, // Need 50 requests to even evaluate
resetTimeout: 30_000,
})
// Reality: by the time 50 requests fail, you've already caused serious damage
// At 100 req/s, that's 500ms of cascading failures before breaker opens
// ✅ Lower thresholds, smaller windows
const breaker = new CircuitBreaker(fn, {
errorThresholdPercentage: 25, // Open after 25% failure rate
volumeThreshold: 10, // Evaluate after just 10 requests
timeout: 2000, // Count requests taking >2s as failures
resetTimeout: 15_000, // Try half-open sooner
})
Problem 2: Not Counting Timeouts as Failures
// ❌ Default opossum behavior — only HTTP errors counted
// Slow responses (10s timeouts) not counted as failures
const breaker = new CircuitBreaker(
() => fetch('http://payment-service/charge'),
{ errorThresholdPercentage: 50 }
// No timeout configured — hanging requests aren't "errors"
)
// The service is slow (10s response), not throwing errors
// Circuit sees 0% error rate — stays CLOSED — queue backs up
// ✅ Configure timeout — slow = failure
const breaker = new CircuitBreaker(
() => fetch('http://payment-service/charge'),
{
timeout: 3000, // 3s timeout counts as failure
errorThresholdPercentage: 25,
volumeThreshold: 10,
}
)
Problem 3: Per-Instance State (Not Shared)
// ❌ Each server process has its OWN circuit breaker state
// Server 1 circuit: OPEN (saw failures)
// Server 2 circuit: CLOSED (hasn't seen enough)
// Server 3 circuit: CLOSED
// Net effect: 2/3 of requests still go to failing service
class PaymentService {
private breaker = new CircuitBreaker(this.charge, options)
// This breaker lives only in this process's memory
}
// ✅ Option 1: Shared state via Redis
class RedisCircuitBreaker {
private readonly FAILURE_KEY: string
private readonly STATE_KEY: string
constructor(
private redis: Redis,
private serviceName: string,
private options = {
failureThreshold: 10,
failureWindowMs: 60_000,
openDurationMs: 30_000,
successThreshold: 3,
}
) {
this.FAILURE_KEY = `cb:failures:${serviceName}`
this.STATE_KEY = `cb:state:${serviceName}`
}
async getState(): Promise<'CLOSED' | 'OPEN' | 'HALF_OPEN'> {
const state = await this.redis.get(this.STATE_KEY)
return (state as any) ?? 'CLOSED'
}
async recordFailure(): Promise<void> {
const pipe = this.redis.pipeline()
pipe.incr(this.FAILURE_KEY)
pipe.pexpire(this.FAILURE_KEY, this.options.failureWindowMs)
const [[, count]] = await pipe.exec() as any
if (count >= this.options.failureThreshold) {
await this.redis.set(
this.STATE_KEY,
'OPEN',
'PX',
this.options.openDurationMs
)
console.error(`Circuit OPENED for ${this.serviceName} — ${count} failures`)
}
}
async recordSuccess(): Promise<void> {
await this.redis.del(this.FAILURE_KEY)
const state = await this.getState()
if (state === 'HALF_OPEN') {
const successes = await this.redis.incr(`cb:successes:${this.serviceName}`)
if (successes >= this.options.successThreshold) {
await this.redis.del(this.STATE_KEY)
await this.redis.del(`cb:successes:${this.serviceName}`)
console.log(`Circuit CLOSED for ${this.serviceName} — recovered`)
}
}
}
async call<T>(fn: () => Promise<T>): Promise<T> {
const state = await this.getState()
if (state === 'OPEN') {
throw new CircuitOpenError(`${this.serviceName} circuit is OPEN`)
}
try {
const result = await fn()
await this.recordSuccess()
return result
} catch (err) {
await this.recordFailure()
throw err
}
}
}
Problem 4: Different Error Types Not Classified
// ❌ All errors treated equally
// 404 Not Found counts as a failure — but it's a client bug, not a service outage
// 400 Bad Request counted as failure — wrong!
// ✅ Only count errors that indicate service degradation
class SmartCircuitBreaker {
private shouldCount(error: any): boolean {
// Count: 5xx errors, timeouts, connection refused
if (error.status >= 500) return true
if (error.code === 'ECONNREFUSED') return true
if (error.code === 'ETIMEDOUT') return true
if (error.name === 'TimeoutError') return true
// Don't count: client errors (4xx), validation failures
if (error.status >= 400 && error.status < 500) return false
return true
}
async call<T>(fn: () => Promise<T>): Promise<T> {
try {
return await fn()
} catch (err) {
if (this.shouldCount(err)) {
await this.recordFailure()
}
throw err
}
}
}
Fix: Production-Ready Circuit Breaker
import CircuitBreaker from 'opossum'
function createBreaker<T>(
name: string,
fn: (...args: any[]) => Promise<T>
) {
const breaker = new CircuitBreaker(fn, {
name,
timeout: 3000, // Fail requests > 3s
errorThresholdPercentage: 25, // Open at 25% error rate
volumeThreshold: 10, // Need 10 requests to evaluate
resetTimeout: 15_000, // Try half-open after 15s
errorFilter: (err) => {
// Don't count client errors against the breaker
return err.status >= 400 && err.status < 500
},
})
// Monitoring
breaker.on('open', () => {
console.error(`[CircuitBreaker] ${name} OPENED`)
metrics.increment(`circuit_breaker.open`, { service: name })
alerting.trigger(`Circuit breaker opened: ${name}`)
})
breaker.on('halfOpen', () => {
console.warn(`[CircuitBreaker] ${name} HALF-OPEN — testing`)
metrics.increment(`circuit_breaker.half_open`, { service: name })
})
breaker.on('close', () => {
console.log(`[CircuitBreaker] ${name} CLOSED — recovered`)
metrics.increment(`circuit_breaker.closed`, { service: name })
})
breaker.on('fallback', (result) => {
metrics.increment(`circuit_breaker.fallback`, { service: name })
})
// Graceful fallback
breaker.fallback(() => {
throw new ServiceUnavailableError(`${name} is currently unavailable`)
})
return breaker
}
// Usage
const paymentBreaker = createBreaker('payment-service', chargePayment)
const inventoryBreaker = createBreaker('inventory-service', checkStock)
// Health endpoint exposes circuit state
app.get('/health/circuits', (req, res) => {
res.json({
payment: paymentBreaker.status.stats,
inventory: inventoryBreaker.status.stats,
})
})
Monitoring Circuit Breaker State
// Track how often circuits open — leading indicator of service health
app.get('/metrics', (req, res) => {
const stats = paymentBreaker.status.stats
res.json({
state: paymentBreaker.opened ? 'OPEN' : 'CLOSED',
failures: stats.failures,
successes: stats.successes,
timeouts: stats.timeouts,
fallbacks: stats.fallbacks,
rejects: stats.rejects, // Requests rejected while circuit open
percentile: {
p50: stats.latencyMean,
p99: stats.percentiles['99'],
},
})
})
Circuit Breaker Configuration Guide
| Scenario | errorThresholdPercentage | volumeThreshold | timeout | resetTimeout |
|---|---|---|---|---|
| Critical payment service | 10% | 5 | 2s | 30s |
| Non-critical recommendations | 50% | 20 | 5s | 10s |
| External third-party API | 25% | 10 | 3s | 60s |
| Internal microservice | 20% | 10 | 1s | 15s |
| Database connection | 30% | 5 | 5s | 30s |
Conclusion
Circuit breakers fail when thresholds are too high, timeouts aren't counted, state isn't shared across instances, or client errors are wrongly counted as service failures. A properly tuned circuit breaker uses a low volume threshold (10 requests) to evaluate quickly, counts timeouts as failures, filters out 4xx client errors, shares state via Redis across all instances, and fires alerts the moment it opens. Without these, your circuit breaker is theater — it looks safe but provides no protection.