Published on

CPU Spikes After Deployment — Diagnosing and Fixing Production Hotspots

Authors

Introduction

You ship a feature that looks completely innocent — a new API endpoint, a small formatting function, a package version bump. Within minutes of deployment, CPU jumps from 20% to 95%. Response times triple. Everything slows.

CPU spikes after deployment are one of the most stressful production incidents because the cause isn't immediately obvious.

Common Root Causes

Before profiling, know what you're looking for:

  1. Catastrophic regex backtracking — An "evil regex" that's O(2ⁿ) on certain inputs
  2. Blocking the event loop — Synchronous CPU work in an async Node.js app
  3. JSON.parse/stringify on every request — Parsing huge payloads without streaming
  4. Inefficient algorithms — O(n²) loop hiding behind clean-looking code
  5. Accidental re-computation — Missing memoization on expensive operations
  6. Hot path with expensive dependencies — A new import that runs heavy logic on every call

Step 1: Confirm the Spike is Code, Not Infrastructure

# Check if it's your code or a noisy neighbor
top -p $(pgrep -f "node")

# Check event loop lag — if > 100ms, you have blocking code
node --inspect your-app.js

# Quick event loop monitor
setInterval(() => {
  const start = process.hrtime.bigint()
  setImmediate(() => {
    const lag = Number(process.hrtime.bigint() - start) / 1_000_000
    if (lag > 50) console.warn(`Event loop lag: ${lag.toFixed(1)}ms`)
  })
}, 1000)

Event loop lag > 50ms = blocking CPU work somewhere.

Step 2: CPU Profiling with Node.js Inspector

# Start with CPU profiler
node --cpu-prof --cpu-prof-interval=100 app.js

# Or attach to running process
node --inspect=0.0.0.0:9229 app.js
# Then open Chrome → chrome://inspect → Take CPU profile
// Programmatic CPU profiling in production
import { Session } from 'inspector'
import fs from 'fs'

app.get('/debug/cpu-profile', async (req, res) => {
  const session = new Session()
  session.connect()

  // Start profiling
  await new Promise(r => session.post('Profiler.enable', r))
  await new Promise(r => session.post('Profiler.start', r))

  // Run for 30 seconds
  await new Promise(r => setTimeout(r, 30_000))

  // Stop and get profile
  session.post('Profiler.stop', (err, { profile }) => {
    const filename = `cpu-${Date.now()}.cpuprofile`
    fs.writeFileSync(filename, JSON.stringify(profile))
    session.disconnect()
    res.json({ file: filename })
  })
})

Open the .cpuprofile in Chrome DevTools → Performance → load file. The flame graph shows exactly where CPU time is spent.

Step 3: Diagnose the Most Common Culprits

Culprit 1: Evil Regex (ReDoS)

// ❌ This regex has catastrophic backtracking
const EMAIL_REGEX = /^([a-zA-Z0-9])(([a-zA-Z0-9])*\.?)*@/

// Input that triggers exponential backtracking
EMAIL_REGEX.test('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!')  // Hangs for minutes!

// ✅ FIX: Use a safe, non-backtracking regex
// Or use a battle-tested library
import { validate } from 'email-validator'

// ✅ Or sanitize/limit input length before regex
function validateEmail(input: string): boolean {
  if (input.length > 254) return false  // Max email length
  return /^[^\s@]+@[^\s@]+\.[^\s@]+$/.test(input)  // Simple, fast
}

Test your regexes with regex101.com — check the "catastrophic backtracking" warning.

Culprit 2: Blocking the Event Loop

// ❌ Synchronous computation blocks all other requests
app.get('/report', (req, res) => {
  // This runs synchronously — blocks event loop for 500ms!
  const result = computeHeavyReport(millionRows)
  res.json(result)
})

// ✅ FIX 1: Use worker threads for CPU-heavy work
import { Worker } from 'worker_threads'

app.get('/report', (req, res) => {
  const worker = new Worker('./report-worker.js', {
    workerData: { params: req.query }
  })

  worker.on('message', result => res.json(result))
  worker.on('error', err => res.status(500).json({ error: err.message }))
})
// report-worker.js
import { workerData, parentPort } from 'worker_threads'

const result = computeHeavyReport(workerData.params)  // Runs on separate thread
parentPort!.postMessage(result)
// ✅ FIX 2: Break work into chunks with setImmediate
async function processInChunks(items: any[]) {
  const CHUNK_SIZE = 100
  const results = []

  for (let i = 0; i < items.length; i += CHUNK_SIZE) {
    const chunk = items.slice(i, i + CHUNK_SIZE)
    results.push(...processChunk(chunk))

    // Yield to event loop between chunks
    await new Promise(r => setImmediate(r))
  }

  return results
}

Culprit 3: JSON Parse/Stringify on Every Request

// ❌ Parsing huge config on every request
app.use((req, res, next) => {
  const config = JSON.parse(fs.readFileSync('config.json', 'utf8'))
  // Disk read + JSON parse on EVERY request!
  req.config = config
  next()
})

// ✅ FIX: Parse once at startup, cache
let config: Config
function getConfig(): Config {
  if (!config) {
    config = JSON.parse(fs.readFileSync('config.json', 'utf8'))
  }
  return config
}

// ✅ FIX: Stream large JSON instead of parse-all
import { createParser } from 'stream-json'

app.post('/import', (req, res) => {
  const pipeline = req.pipe(createParser())
  pipeline.on('data', processRecord)  // Process one record at a time
  pipeline.on('end', () => res.json({ done: true }))
})

Culprit 4: O(n²) Algorithm in Hot Path

// ❌ O(n²) — nested loop on every request
app.get('/users', async (req, res) => {
  const users = await db.users.findAll()    // 10,000 users
  const groups = await db.groups.findAll()  // 1,000 groups

  // O(users × groups) = 10,000,000 iterations on each request!
  const result = users.map(user => ({
    ...user,
    group: groups.find(g => g.id === user.groupId)
  }))

  res.json(result)
})

// ✅ FIX: Build a lookup map — O(n + m)
app.get('/users', async (req, res) => {
  const [users, groups] = await Promise.all([
    db.users.findAll(),
    db.groups.findAll(),
  ])

  // Build O(1) lookup map
  const groupMap = new Map(groups.map(g => [g.id, g]))

  const result = users.map(user => ({
    ...user,
    group: groupMap.get(user.groupId)  // O(1) lookup!
  }))

  res.json(result)
})

Culprit 5: Missing Memoization on Expensive Pure Functions

// ❌ Recomputing expensive result on every call
function parseMarkdown(content: string): string {
  // Heavy computation — parsing, AST transformation, rendering
  return marked(content)
}

app.get('/post/:id', async (req, res) => {
  const post = await db.post.findById(req.params.id)
  const html = parseMarkdown(post.content)  // Re-computed every request!
  res.json({ ...post, html })
})

// ✅ FIX: Memoize or cache the result
const markdownCache = new LRU({ max: 500, ttl: 60_000 })

function parseMarkdownCached(content: string): string {
  const cacheKey = content.substring(0, 100) + content.length  // Fast key
  if (markdownCache.has(cacheKey)) return markdownCache.get(cacheKey)!

  const result = marked(content)
  markdownCache.set(cacheKey, result)
  return result
}

Continuous Profiling in Production

// Periodic flame graph sampling
import { ProfilingAgent } from '@google-cloud/profiler'  // Or equivalent

// Sample CPU profiles automatically
ProfilingAgent.start({
  serviceContext: {
    service: 'my-api',
    version: process.env.DEPLOY_VERSION
  }
})

// Or simple event loop monitoring
import { monitorEventLoopDelay } from 'perf_hooks'

const histogram = monitorEventLoopDelay({ resolution: 20 })
histogram.enable()

setInterval(() => {
  const p99Lag = histogram.percentile(99) / 1e6  // Convert ns to ms
  if (p99Lag > 100) {
    logger.warn(`Event loop p99 delay: ${p99Lag.toFixed(1)}ms`)
  }
  histogram.reset()
}, 10_000)

Conclusion

CPU spikes after deployment almost always come down to one of five culprits: evil regex, blocking the event loop, parsing large data on every request, hidden O(n²) algorithms, or missing memoization. Profile first — don't guess. The CPU flame graph will point you directly at the hot function. Then fix the algorithm, not just the symptom.