- Published on
The Backend Performance Checklist for 2026 — From Database to Edge
- Authors

- Name
- Sanjeev Sharma
- @webcoderspeed1
Introduction
Performance is still the most measurable, underrated optimization. A 100ms faster API isn''t just better user experience—it''s lower error rates, higher conversion, and less infrastructure cost.
This is a comprehensive checklist across all layers: database, application, caching, network, and edge. It''s not theoretical. It''s from production systems handling billions of requests.
- Database Layer
- Application Layer
- Caching Layer
- Network Layer
- AI Performance
- Observability for Performance
- Performance Budget Per Layer
- Continuous Performance Testing in CI
- Profiling Tools
- Common Performance Anti-Patterns
- Checklist
- Conclusion
Database Layer
Index Coverage
- All WHERE columns are indexed
- All JOIN columns are indexed
- No sequential scans in slow queries (EXPLAIN ANALYZE)
- Composite indexes for multi-column queries
- No redundant indexes (tool: pg_stat_statements)
Connection Pooling
- Using PgBouncer or similar (never raw connections)
- Pool size tuned (2x CPU cores is often right)
- Connection idle timeout configured
- Monitor active connections (should be < pool size)
Query Performance
- Run EXPLAIN ANALYZE on all slow queries
- Query plan shows index usage (no full scans)
- No N+1 queries (batch load or join)
- Pagination on large result sets (LIMIT, OFFSET)
- Use SELECT specific columns (not *)
Read Replicas
- Read-heavy queries go to replicas
- Replication lag monitored (< 1s ideal)
- Failover tested and documented
- Load balanced across replicas
Application Layer
Async I/O
- No blocking database calls in main thread
- Database operations wrapped in promises/async-await
- External API calls are non-blocking
- Worker threads for CPU-bound work
Streaming
- Large responses streamed (not buffered in memory)
- Database result sets streamed
- File uploads streamed
- Response size monitored
Worker Threads for CPU
- Heavy computation offloaded to worker pool
- Worker pool size tuned (CPU cores = pool size)
- Monitor queue depth and processing time
Memory Management
- No memory leaks (heap profiler run regularly)
- Large objects not kept in memory
- Proper cleanup in destructors
- Memory usage stable over time
Caching Layer
In-Process Cache
- Simple data cached in process (LRU)
- Cache invalidation on data change
- TTL configured appropriately
- Cache size monitored (shouldn''t grow unbounded)
Redis Cache
- Expensive computations cached (embedding generation, LLM calls)
- Cache invalidation strategy documented
- Eviction policy set (usually allkeys-lru)
- Memory usage monitored
- Replication configured for HA
HTTP Caching
- Cache-Control headers set correctly
- ETag or Last-Modified for revalidation
- Public vs private cached content
- No caching private data
CDN / Edge Caching
- Static assets served from CDN
- Cache headers optimized per content type
- Purge strategy for updates
- Monitor cache hit ratio (< 80% = problem)
Network Layer
HTTP/2 or HTTP/3
- Upgrade from HTTP/1.1
- Multiplexing working (faster page loads)
- Server push configured (optional)
- QUIC/HTTP/3 if infrastructure supports
Compression
- Gzip or Brotli enabled
- Compression ratio monitored
- CPU cost of compression vs bandwidth savings balanced
Connection Reuse
- Keep-alive enabled (requests don''t reconnect)
- Connection pooling on the client side
- TLS session resumption working
DNS Performance
- DNS resolution time < 50ms
- DNS caching on client
- Separate domain for static assets (parallel connections)
AI Performance
Semantic Caching
- Identical queries cached (same question < 100 tokens apart)
- Cache hit rate monitored
- Saves ~80% on embedding+LLM costs
Streaming Responses
- Token-level streaming (not buffering full response)
- Client receives first token < 200ms
- Reduces perceived latency dramatically
Async LLM Calls
- LLM inference doesn''t block main request
- Queue for async processing
- Poll for results or webhook callback
- Timeout configured (don''t wait forever)
Model Selection
- Right-sized model for task (Llama 3 7B often beats GPT-4 for cost)
- Quantization reduces latency (Q4_K_M is usually good)
- Batch requests when possible
- Smaller models for lower-latency paths
Prompt Optimization
- Prompts are as short as possible (fewer tokens = faster)
- Few-shot examples included only if needed
- Avoid over-engineering prompts
Observability for Performance
Latency Percentiles
- Track p50, p95, p99 latency
- Not just averages (average hides problems)
- Disaggregated by endpoint
- Alerts on p95 < 1s (or your SLA)
Throughput
- Requests per second by endpoint
- Concurrent connections monitored
- Bottlenecks identified (database, cache, network)
Resource Utilization
- CPU usage < 70% (headroom for spikes)
- Memory usage stable
- Disk I/O monitored
- Network bandwidth headroom
Distributed Tracing
- End-to-end request flow traced
- Latency breakdown by service/operation
- Bottleneck identification
- Example: Datadog APM, Jaeger
Performance Budget Per Layer
Define acceptable latency per layer:
Total API latency budget: 100ms
- Database query: 20ms
- Cache lookup: 5ms
- Application logic: 30ms
- External API calls: 30ms
- Serialization: 15ms
If database is slow, you don''t have 30ms for logic. Enforce budgets in code reviews.
Continuous Performance Testing in CI
Load Testing
- Run before deployment (baseline)
- Compare to previous version
- Fail if performance degrades < 10%
- Tools: k6, JMeter, Locust
Benchmark Suite
- Critical path benchmarks
- Run on every PR
- Track results over time
- Alert on regressions
Example (k6 load test):
import http from 'k6/http';
import { check } from 'k6';
export let options = {
vus: 100,
duration: '30s'
};
export default function () {
let response = http.get('https://api.example.com/users');
check(response, {
'status is 200': (r) => r.status === 200,
'latency < 100ms': (r) => r.timings.duration < 100,
});
}
Run in CI on every PR. Fail if latency increases.
Profiling Tools
For Node.js:
- node --prof (CPU profiling)
- heapdump (memory profiling)
- Clinic.js (visualized profiling)
For Databases:
- EXPLAIN ANALYZE (query plans)
- pg_stat_statements (slow query log)
- New Relic or DataDog (APM)
Common Performance Anti-Patterns
Fetching and Discarding
// Bad: fetch all, filter in code
const allUsers = await db.query("SELECT * FROM users");
const active = allUsers.filter(u => u.status === 'active');
// Good: filter in database
const active = await db.query("SELECT * FROM users WHERE status = 'active'");
N+1 Queries
// Bad: 1 + N queries
const users = await db.query("SELECT * FROM users");
for (const user of users) {
const posts = await db.query("SELECT * FROM posts WHERE user_id = ?", user.id);
}
// Good: 2 queries
const users = await db.query("SELECT * FROM users");
const posts = await db.query("SELECT * FROM posts WHERE user_id IN (?)", [users.map(u => u.id)]);
Serializing Large Objects
// Bad: serializing is slow
const json = JSON.stringify(largeObject);
// Good: stream or chunk
response.write(JSON.stringify(largeObject));
Checklist
Before Optimization:
- Measured baseline (p50, p95, p99 latency)
- Identified bottleneck (database? cache? network?)
- Set performance target (e.g., p95 < 100ms)
Database:
- Index coverage complete
- Connection pooling enabled
- Slow queries optimized
- Read replicas for heavy reads
Application:
- Async I/O throughout
- Streaming for large responses
- No memory leaks
- Worker threads for CPU work
Caching:
- In-process cache for hot data
- Redis for expensive operations
- HTTP caching headers set
- CDN configured
Network:
- HTTP/2 enabled
- Compression enabled
- Keep-alive working
- DNS fast
Monitoring:
- Percentile latency tracked (p50, p95, p99)
- Alerts on SLA breach
- Distributed tracing enabled
- Performance regressions caught in CI
Conclusion
Performance is systematic. It''s not magic. Database indexing, async I/O, caching, and monitoring get you 90% of the way. The last 10% comes from profiling and targeted optimization.
Use this checklist before starting. Measure first. Optimize what actually matters. Ship faster, happier users.