Thundering Herd on Service Restart — The Restart That Kills Your System

Introduction

You deploy a critical hotfix. The pod restarts. The health check passes. Traffic flows in. And then — the new instance dies under a crushing wave of requests that had queued up during the 15-second restart window.

This is the Thundering Herd on Service Restart — a self-inflicted DDoS.

Why It Happens
Root Causes
Fix 1: Slow Start / Traffic Ramping in Load Balancer
Fix 2: Readiness Probe with Warm-Up
Fix 3: Request Rate Limiting on Startup
Fix 4: Graceful Shutdown (Drain Before Restart)
Fix 5: Circuit Breaker at the Client
Fix 6: Connection Pool Lazy Initialization
Kubernetes Rolling Deployment Best Practices
Monitoring Restart Events
Conclusion

Why It Happens

During a service restart:

t=0s   Service goes down
t=0-15s  Requests queue at load balancer, retry logic fires, clients reconnect
         Queue builds: 10,000 pending requests...
t=15s  Service comes back up — HEALTHY
t=15s  ALL 10,000 queued requests hit the fresh instance simultaneously
t=15s  CPU 100%, memory spike, DB connections exhausted, service crashes again
t=16s  Restart loop begins

The service never gets a chance to warm up. It's crushed before it can handle anything.

Root Causes

Long restart window — Cold JVM/Node.js startup takes time
Client retry storms — Clients with exponential backoff all retry at once when the service comes back
No request queuing — Load balancer dumps everything at once
No warm-up period — Service is marked healthy before it's actually ready
Connection pool pre-fill — DB connection pool initializes hundreds of connections simultaneously on boot

Fix 1: Slow Start / Traffic Ramping in Load Balancer

Don't send 100% of traffic immediately — ramp up:

# Nginx upstream slow_start
upstream backend {
  server app1:3000 slow_start=30s;  # Ramp traffic over 30 seconds
  server app2:3000 slow_start=30s;
}

# Kubernetes — progressive traffic via Argo Rollouts
apiVersion: argoproj.io/v1alpha1
kind: Rollout
spec:
  strategy:
    canary:
      steps:
        - setWeight: 5    # 5% of traffic first
        - pause: { duration: 30s }
        - setWeight: 25
        - pause: { duration: 30s }
        - setWeight: 100

Fix 2: Readiness Probe with Warm-Up

Your health check should only pass once your app is actually ready — connections established, caches pre-warmed:

// Express warm-up before marking as ready
import express from 'express'

const app = express()
let isReady = false

async function warmUp() {
  console.log('Warming up...')

  // Pre-establish DB connection pool
  await db.connect()

  // Pre-warm critical caches
  await Promise.all([
    cache.prefetch('config:global'),
    cache.prefetch('feature-flags'),
    cache.prefetch('rate-limits'),
  ])

  // Run a test query to ensure DB is responsive
  await db.query('SELECT 1')

  isReady = true
  console.log('Warm-up complete — accepting traffic')
}

// Kubernetes readiness probe
app.get('/ready', (req, res) => {
  if (isReady) {
    res.status(200).json({ status: 'ready' })
  } else {
    res.status(503).json({ status: 'warming up' })
  }
})

// Kubernetes liveness probe (separate — just "am I alive?")
app.get('/health', (req, res) => {
  res.status(200).json({ status: 'alive' })
})

app.listen(3000, async () => {
  await warmUp()
})

# kubernetes deployment
readinessProbe:
  httpGet:
    path: /ready
    port: 3000
  initialDelaySeconds: 10   # Wait 10s before first check
  periodSeconds: 5
  failureThreshold: 6        # 30s to warm up before failing

livenessProbe:
  httpGet:
    path: /health
    port: 3000
  initialDelaySeconds: 30
  periodSeconds: 10

Fix 3: Request Rate Limiting on Startup

Throttle incoming requests during the warm-up window:

import { RateLimiterMemory } from 'rate-limiter-flexible'

let startupLimiter: RateLimiterMemory | null = new RateLimiterMemory({
  points: 50,     // Only 50 req/s during warm-up
  duration: 1,
})

// After 60 seconds, remove the startup limiter
setTimeout(() => {
  startupLimiter = null
  console.log('Startup rate limit removed — running at full capacity')
}, 60_000)

app.use(async (req, res, next) => {
  if (!startupLimiter) return next()

  try {
    await startupLimiter.consume(req.ip)
    next()
  } catch {
    res.status(503).json({ error: 'Service starting up, please retry' })
  }
})

Fix 4: Graceful Shutdown (Drain Before Restart)

Don't crash — finish in-flight requests before restarting:

const server = app.listen(3000)
let isShuttingDown = false

process.on('SIGTERM', async () => {
  console.log('SIGTERM received — graceful shutdown starting')
  isShuttingDown = true

  // 1. Stop accepting new connections
  server.close(async () => {
    console.log('HTTP server closed')

    // 2. Finish in-flight requests (already handled by server.close)
    // 3. Close DB connections
    await db.end()
    console.log('DB connections closed')

    process.exit(0)
  })

  // 4. Force-kill after 30s if graceful drain stalls
  setTimeout(() => {
    console.error('Graceful shutdown timeout — forcing exit')
    process.exit(1)
  }, 30_000)
})

// 5. Reject new requests during shutdown
app.use((req, res, next) => {
  if (isShuttingDown) {
    res.setHeader('Connection', 'close')
    return res.status(503).json({ error: 'Service shutting down' })
  }
  next()
})

Fix 5: Circuit Breaker at the Client

If you control the clients, prevent retry storms with a circuit breaker:

import CircuitBreaker from 'opossum'

const options = {
  timeout: 3000,
  errorThresholdPercentage: 50,
  resetTimeout: 30000,
  // Critical: limit concurrent requests during half-open state
  volumeThreshold: 10,
}

const breaker = new CircuitBreaker(callDownstreamService, options)

// Half-open: only let 1 request through to test recovery
breaker.on('halfOpen', () => console.log('Circuit half-open — testing recovery'))
breaker.on('close', () => console.log('Circuit closed — service healthy'))

Fix 6: Connection Pool Lazy Initialization

Spread out the DB connection pool initialization:

import { Pool } from 'pg'

const pool = new Pool({
  connectionString: process.env.DATABASE_URL,
  min: 2,    // Start with just 2 connections
  max: 20,   // Grow to 20 max
  // Connections are created on demand, not all at startup
  idleTimeoutMillis: 30_000,
})

// Pre-create only minimum connections during warm-up
async function warmUpPool() {
  const warmUpConnections = 2
  const clients = await Promise.all(
    Array.from({ length: warmUpConnections }, () => pool.connect())
  )
  clients.forEach(c => c.release())
  console.log(`Pool pre-warmed with ${warmUpConnections} connections`)
}

Kubernetes Rolling Deployment Best Practices

spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1           # Only 1 new pod at a time
      maxUnavailable: 0     # Never take down a pod before new one is ready
  template:
    spec:
      containers:
        - name: app
          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh", "-c", "sleep 5"]  # Drain before SIGTERM

Monitoring Restart Events

// Track restart-related metrics
const metrics = {
  startTime: Date.now(),
  warmUpCompleted: false,

  uptimeMs: () => Date.now() - metrics.startTime,

  isWarm: () => metrics.warmUpCompleted,
}

// Alert if restarts are too frequent
let restartCount = 0
process.on('SIGTERM', () => {
  restartCount++
  if (restartCount > 3) {
    logger.alert('Service restarting frequently — possible thundering herd loop')
  }
})

Conclusion

A thundering herd on restart is a restart loop that can take down your entire service. The solutions work in layers: graceful shutdown ensures clean exits, readiness probes prevent premature traffic, slow start ramps up load gradually, and startup rate limiting gives your service room to breathe. Implement all of them for bulletproof deployments.