- Published on
CPU Spikes After Deployment — Diagnosing and Fixing Production Hotspots
- Authors

- Name
- Sanjeev Sharma
- @webcoderspeed1
Introduction
You ship a feature that looks completely innocent — a new API endpoint, a small formatting function, a package version bump. Within minutes of deployment, CPU jumps from 20% to 95%. Response times triple. Everything slows.
CPU spikes after deployment are one of the most stressful production incidents because the cause isn't immediately obvious.
- Common Root Causes
- Step 1: Confirm the Spike is Code, Not Infrastructure
- Step 2: CPU Profiling with Node.js Inspector
- Step 3: Diagnose the Most Common Culprits
- Culprit 1: Evil Regex (ReDoS)
- Culprit 2: Blocking the Event Loop
- Culprit 3: JSON Parse/Stringify on Every Request
- Culprit 4: O(n²) Algorithm in Hot Path
- Culprit 5: Missing Memoization on Expensive Pure Functions
- Continuous Profiling in Production
- Conclusion
Common Root Causes
Before profiling, know what you're looking for:
- Catastrophic regex backtracking — An "evil regex" that's O(2ⁿ) on certain inputs
- Blocking the event loop — Synchronous CPU work in an async Node.js app
- JSON.parse/stringify on every request — Parsing huge payloads without streaming
- Inefficient algorithms — O(n²) loop hiding behind clean-looking code
- Accidental re-computation — Missing memoization on expensive operations
- Hot path with expensive dependencies — A new import that runs heavy logic on every call
Step 1: Confirm the Spike is Code, Not Infrastructure
# Check if it's your code or a noisy neighbor
top -p $(pgrep -f "node")
# Check event loop lag — if > 100ms, you have blocking code
node --inspect your-app.js
# Quick event loop monitor
setInterval(() => {
const start = process.hrtime.bigint()
setImmediate(() => {
const lag = Number(process.hrtime.bigint() - start) / 1_000_000
if (lag > 50) console.warn(`Event loop lag: ${lag.toFixed(1)}ms`)
})
}, 1000)
Event loop lag > 50ms = blocking CPU work somewhere.
Step 2: CPU Profiling with Node.js Inspector
# Start with CPU profiler
node --cpu-prof --cpu-prof-interval=100 app.js
# Or attach to running process
node --inspect=0.0.0.0:9229 app.js
# Then open Chrome → chrome://inspect → Take CPU profile
// Programmatic CPU profiling in production
import { Session } from 'inspector'
import fs from 'fs'
app.get('/debug/cpu-profile', async (req, res) => {
const session = new Session()
session.connect()
// Start profiling
await new Promise(r => session.post('Profiler.enable', r))
await new Promise(r => session.post('Profiler.start', r))
// Run for 30 seconds
await new Promise(r => setTimeout(r, 30_000))
// Stop and get profile
session.post('Profiler.stop', (err, { profile }) => {
const filename = `cpu-${Date.now()}.cpuprofile`
fs.writeFileSync(filename, JSON.stringify(profile))
session.disconnect()
res.json({ file: filename })
})
})
Open the .cpuprofile in Chrome DevTools → Performance → load file. The flame graph shows exactly where CPU time is spent.
Step 3: Diagnose the Most Common Culprits
Culprit 1: Evil Regex (ReDoS)
// ❌ This regex has catastrophic backtracking
const EMAIL_REGEX = /^([a-zA-Z0-9])(([a-zA-Z0-9])*\.?)*@/
// Input that triggers exponential backtracking
EMAIL_REGEX.test('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!') // Hangs for minutes!
// ✅ FIX: Use a safe, non-backtracking regex
// Or use a battle-tested library
import { validate } from 'email-validator'
// ✅ Or sanitize/limit input length before regex
function validateEmail(input: string): boolean {
if (input.length > 254) return false // Max email length
return /^[^\s@]+@[^\s@]+\.[^\s@]+$/.test(input) // Simple, fast
}
Test your regexes with regex101.com — check the "catastrophic backtracking" warning.
Culprit 2: Blocking the Event Loop
// ❌ Synchronous computation blocks all other requests
app.get('/report', (req, res) => {
// This runs synchronously — blocks event loop for 500ms!
const result = computeHeavyReport(millionRows)
res.json(result)
})
// ✅ FIX 1: Use worker threads for CPU-heavy work
import { Worker } from 'worker_threads'
app.get('/report', (req, res) => {
const worker = new Worker('./report-worker.js', {
workerData: { params: req.query }
})
worker.on('message', result => res.json(result))
worker.on('error', err => res.status(500).json({ error: err.message }))
})
// report-worker.js
import { workerData, parentPort } from 'worker_threads'
const result = computeHeavyReport(workerData.params) // Runs on separate thread
parentPort!.postMessage(result)
// ✅ FIX 2: Break work into chunks with setImmediate
async function processInChunks(items: any[]) {
const CHUNK_SIZE = 100
const results = []
for (let i = 0; i < items.length; i += CHUNK_SIZE) {
const chunk = items.slice(i, i + CHUNK_SIZE)
results.push(...processChunk(chunk))
// Yield to event loop between chunks
await new Promise(r => setImmediate(r))
}
return results
}
Culprit 3: JSON Parse/Stringify on Every Request
// ❌ Parsing huge config on every request
app.use((req, res, next) => {
const config = JSON.parse(fs.readFileSync('config.json', 'utf8'))
// Disk read + JSON parse on EVERY request!
req.config = config
next()
})
// ✅ FIX: Parse once at startup, cache
let config: Config
function getConfig(): Config {
if (!config) {
config = JSON.parse(fs.readFileSync('config.json', 'utf8'))
}
return config
}
// ✅ FIX: Stream large JSON instead of parse-all
import { createParser } from 'stream-json'
app.post('/import', (req, res) => {
const pipeline = req.pipe(createParser())
pipeline.on('data', processRecord) // Process one record at a time
pipeline.on('end', () => res.json({ done: true }))
})
Culprit 4: O(n²) Algorithm in Hot Path
// ❌ O(n²) — nested loop on every request
app.get('/users', async (req, res) => {
const users = await db.users.findAll() // 10,000 users
const groups = await db.groups.findAll() // 1,000 groups
// O(users × groups) = 10,000,000 iterations on each request!
const result = users.map(user => ({
...user,
group: groups.find(g => g.id === user.groupId)
}))
res.json(result)
})
// ✅ FIX: Build a lookup map — O(n + m)
app.get('/users', async (req, res) => {
const [users, groups] = await Promise.all([
db.users.findAll(),
db.groups.findAll(),
])
// Build O(1) lookup map
const groupMap = new Map(groups.map(g => [g.id, g]))
const result = users.map(user => ({
...user,
group: groupMap.get(user.groupId) // O(1) lookup!
}))
res.json(result)
})
Culprit 5: Missing Memoization on Expensive Pure Functions
// ❌ Recomputing expensive result on every call
function parseMarkdown(content: string): string {
// Heavy computation — parsing, AST transformation, rendering
return marked(content)
}
app.get('/post/:id', async (req, res) => {
const post = await db.post.findById(req.params.id)
const html = parseMarkdown(post.content) // Re-computed every request!
res.json({ ...post, html })
})
// ✅ FIX: Memoize or cache the result
const markdownCache = new LRU({ max: 500, ttl: 60_000 })
function parseMarkdownCached(content: string): string {
const cacheKey = content.substring(0, 100) + content.length // Fast key
if (markdownCache.has(cacheKey)) return markdownCache.get(cacheKey)!
const result = marked(content)
markdownCache.set(cacheKey, result)
return result
}
Continuous Profiling in Production
// Periodic flame graph sampling
import { ProfilingAgent } from '@google-cloud/profiler' // Or equivalent
// Sample CPU profiles automatically
ProfilingAgent.start({
serviceContext: {
service: 'my-api',
version: process.env.DEPLOY_VERSION
}
})
// Or simple event loop monitoring
import { monitorEventLoopDelay } from 'perf_hooks'
const histogram = monitorEventLoopDelay({ resolution: 20 })
histogram.enable()
setInterval(() => {
const p99Lag = histogram.percentile(99) / 1e6 // Convert ns to ms
if (p99Lag > 100) {
logger.warn(`Event loop p99 delay: ${p99Lag.toFixed(1)}ms`)
}
histogram.reset()
}, 10_000)
Conclusion
CPU spikes after deployment almost always come down to one of five culprits: evil regex, blocking the event loop, parsing large data on every request, hidden O(n²) algorithms, or missing memoization. Profile first — don't guess. The CPU flame graph will point you directly at the hot function. Then fix the algorithm, not just the symptom.