CPU Spikes After Deployment — Diagnosing and Fixing Production Hotspots

Introduction

You ship a feature that looks completely innocent — a new API endpoint, a small formatting function, a package version bump. Within minutes of deployment, CPU jumps from 20% to 95%. Response times triple. Everything slows.

CPU spikes after deployment are one of the most stressful production incidents because the cause isn't immediately obvious.

Common Root Causes
Step 1: Confirm the Spike is Code, Not Infrastructure
Step 2: CPU Profiling with Node.js Inspector
Step 3: Diagnose the Most Common Culprits
Culprit 1: Evil Regex (ReDoS)
Culprit 2: Blocking the Event Loop
Culprit 3: JSON Parse/Stringify on Every Request
Culprit 4: O(n²) Algorithm in Hot Path
Culprit 5: Missing Memoization on Expensive Pure Functions
Continuous Profiling in Production
Conclusion

Common Root Causes

Before profiling, know what you're looking for:

Catastrophic regex backtracking — An "evil regex" that's O(2ⁿ) on certain inputs
Blocking the event loop — Synchronous CPU work in an async Node.js app
JSON.parse/stringify on every request — Parsing huge payloads without streaming
Inefficient algorithms — O(n²) loop hiding behind clean-looking code
Accidental re-computation — Missing memoization on expensive operations
Hot path with expensive dependencies — A new import that runs heavy logic on every call

Step 1: Confirm the Spike is Code, Not Infrastructure

# Check if it's your code or a noisy neighbor
top -p $(pgrep -f "node")

# Check event loop lag — if > 100ms, you have blocking code
node --inspect your-app.js

# Quick event loop monitor
setInterval(() => {
  const start = process.hrtime.bigint()
  setImmediate(() => {
    const lag = Number(process.hrtime.bigint() - start) / 1_000_000
    if (lag > 50) console.warn(`Event loop lag: ${lag.toFixed(1)}ms`)
  })
}, 1000)

Event loop lag > 50ms = blocking CPU work somewhere.

Step 2: CPU Profiling with Node.js Inspector

# Start with CPU profiler
node --cpu-prof --cpu-prof-interval=100 app.js

# Or attach to running process
node --inspect=0.0.0.0:9229 app.js
# Then open Chrome → chrome://inspect → Take CPU profile

// Programmatic CPU profiling in production
import { Session } from 'inspector'
import fs from 'fs'

app.get('/debug/cpu-profile', async (req, res) => {
  const session = new Session()
  session.connect()

  // Start profiling
  await new Promise(r => session.post('Profiler.enable', r))
  await new Promise(r => session.post('Profiler.start', r))

  // Run for 30 seconds
  await new Promise(r => setTimeout(r, 30_000))

  // Stop and get profile
  session.post('Profiler.stop', (err, { profile }) => {
    const filename = `cpu-${Date.now()}.cpuprofile`
    fs.writeFileSync(filename, JSON.stringify(profile))
    session.disconnect()
    res.json({ file: filename })
  })
})

Open the .cpuprofile in Chrome DevTools → Performance → load file. The flame graph shows exactly where CPU time is spent.

Step 3: Diagnose the Most Common Culprits

Culprit 1: Evil Regex (ReDoS)

// ❌ This regex has catastrophic backtracking
const EMAIL_REGEX = /^([a-zA-Z0-9])(([a-zA-Z0-9])*\.?)*@/

// Input that triggers exponential backtracking
EMAIL_REGEX.test('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!')  // Hangs for minutes!

// ✅ FIX: Use a safe, non-backtracking regex
// Or use a battle-tested library
import { validate } from 'email-validator'

// ✅ Or sanitize/limit input length before regex
function validateEmail(input: string): boolean {
  if (input.length > 254) return false  // Max email length
  return /^[^\s@]+@[^\s@]+\.[^\s@]+$/.test(input)  // Simple, fast
}

Test your regexes with regex101.com — check the "catastrophic backtracking" warning.

Culprit 2: Blocking the Event Loop

// ❌ Synchronous computation blocks all other requests
app.get('/report', (req, res) => {
  // This runs synchronously — blocks event loop for 500ms!
  const result = computeHeavyReport(millionRows)
  res.json(result)
})

// ✅ FIX 1: Use worker threads for CPU-heavy work
import { Worker } from 'worker_threads'

app.get('/report', (req, res) => {
  const worker = new Worker('./report-worker.js', {
    workerData: { params: req.query }
  })

  worker.on('message', result => res.json(result))
  worker.on('error', err => res.status(500).json({ error: err.message }))
})

// report-worker.js
import { workerData, parentPort } from 'worker_threads'

const result = computeHeavyReport(workerData.params)  // Runs on separate thread
parentPort!.postMessage(result)

// ✅ FIX 2: Break work into chunks with setImmediate
async function processInChunks(items: any[]) {
  const CHUNK_SIZE = 100
  const results = []

  for (let i = 0; i < items.length; i += CHUNK_SIZE) {
    const chunk = items.slice(i, i + CHUNK_SIZE)
    results.push(...processChunk(chunk))

    // Yield to event loop between chunks
    await new Promise(r => setImmediate(r))
  }

  return results
}

Culprit 3: JSON Parse/Stringify on Every Request

// ❌ Parsing huge config on every request
app.use((req, res, next) => {
  const config = JSON.parse(fs.readFileSync('config.json', 'utf8'))
  // Disk read + JSON parse on EVERY request!
  req.config = config
  next()
})

// ✅ FIX: Parse once at startup, cache
let config: Config
function getConfig(): Config {
  if (!config) {
    config = JSON.parse(fs.readFileSync('config.json', 'utf8'))
  }
  return config
}

// ✅ FIX: Stream large JSON instead of parse-all
import { createParser } from 'stream-json'

app.post('/import', (req, res) => {
  const pipeline = req.pipe(createParser())
  pipeline.on('data', processRecord)  // Process one record at a time
  pipeline.on('end', () => res.json({ done: true }))
})

Culprit 4: O(n²) Algorithm in Hot Path

// ❌ O(n²) — nested loop on every request
app.get('/users', async (req, res) => {
  const users = await db.users.findAll()    // 10,000 users
  const groups = await db.groups.findAll()  // 1,000 groups

  // O(users × groups) = 10,000,000 iterations on each request!
  const result = users.map(user => ({
    ...user,
    group: groups.find(g => g.id === user.groupId)
  }))

  res.json(result)
})

// ✅ FIX: Build a lookup map — O(n + m)
app.get('/users', async (req, res) => {
  const [users, groups] = await Promise.all([
    db.users.findAll(),
    db.groups.findAll(),
  ])

  // Build O(1) lookup map
  const groupMap = new Map(groups.map(g => [g.id, g]))

  const result = users.map(user => ({
    ...user,
    group: groupMap.get(user.groupId)  // O(1) lookup!
  }))

  res.json(result)
})

Culprit 5: Missing Memoization on Expensive Pure Functions

// ❌ Recomputing expensive result on every call
function parseMarkdown(content: string): string {
  // Heavy computation — parsing, AST transformation, rendering
  return marked(content)
}

app.get('/post/:id', async (req, res) => {
  const post = await db.post.findById(req.params.id)
  const html = parseMarkdown(post.content)  // Re-computed every request!
  res.json({ ...post, html })
})

// ✅ FIX: Memoize or cache the result
const markdownCache = new LRU({ max: 500, ttl: 60_000 })

function parseMarkdownCached(content: string): string {
  const cacheKey = content.substring(0, 100) + content.length  // Fast key
  if (markdownCache.has(cacheKey)) return markdownCache.get(cacheKey)!

  const result = marked(content)
  markdownCache.set(cacheKey, result)
  return result
}

Continuous Profiling in Production

// Periodic flame graph sampling
import { ProfilingAgent } from '@google-cloud/profiler'  // Or equivalent

// Sample CPU profiles automatically
ProfilingAgent.start({
  serviceContext: {
    service: 'my-api',
    version: process.env.DEPLOY_VERSION
  }
})

// Or simple event loop monitoring
import { monitorEventLoopDelay } from 'perf_hooks'

const histogram = monitorEventLoopDelay({ resolution: 20 })
histogram.enable()

setInterval(() => {
  const p99Lag = histogram.percentile(99) / 1e6  // Convert ns to ms
  if (p99Lag > 100) {
    logger.warn(`Event loop p99 delay: ${p99Lag.toFixed(1)}ms`)
  }
  histogram.reset()
}, 10_000)

Conclusion

CPU spikes after deployment almost always come down to one of five culprits: evil regex, blocking the event loop, parsing large data on every request, hidden O(n²) algorithms, or missing memoization. Profile first — don't guess. The CPU flame graph will point you directly at the hot function. Then fix the algorithm, not just the symptom.