The Backend Performance Checklist for 2026 — From Database to Edge

Introduction

Performance is still the most measurable, underrated optimization. A 100ms faster API isn''t just better user experience—it''s lower error rates, higher conversion, and less infrastructure cost.

This is a comprehensive checklist across all layers: database, application, caching, network, and edge. It''s not theoretical. It''s from production systems handling billions of requests.

Database Layer
Application Layer
Caching Layer
Network Layer
AI Performance
Observability for Performance
Performance Budget Per Layer
Continuous Performance Testing in CI
Profiling Tools
Common Performance Anti-Patterns
Checklist
Conclusion

Database Layer

Index Coverage

All WHERE columns are indexed
All JOIN columns are indexed
No sequential scans in slow queries (EXPLAIN ANALYZE)
Composite indexes for multi-column queries
No redundant indexes (tool: pg_stat_statements)

Connection Pooling

Using PgBouncer or similar (never raw connections)
Pool size tuned (2x CPU cores is often right)
Connection idle timeout configured
Monitor active connections (should be < pool size)

Query Performance

Run EXPLAIN ANALYZE on all slow queries
Query plan shows index usage (no full scans)
No N+1 queries (batch load or join)
Pagination on large result sets (LIMIT, OFFSET)
Use SELECT specific columns (not *)

Read Replicas

Read-heavy queries go to replicas
Replication lag monitored (< 1s ideal)
Failover tested and documented
Load balanced across replicas

Application Layer

Async I/O

No blocking database calls in main thread
Database operations wrapped in promises/async-await
External API calls are non-blocking
Worker threads for CPU-bound work

Streaming

Large responses streamed (not buffered in memory)
Database result sets streamed
File uploads streamed
Response size monitored

Worker Threads for CPU

Heavy computation offloaded to worker pool
Worker pool size tuned (CPU cores = pool size)
Monitor queue depth and processing time

Memory Management

No memory leaks (heap profiler run regularly)
Large objects not kept in memory
Proper cleanup in destructors
Memory usage stable over time

Caching Layer

In-Process Cache

Simple data cached in process (LRU)
Cache invalidation on data change
TTL configured appropriately
Cache size monitored (shouldn''t grow unbounded)

Redis Cache

Expensive computations cached (embedding generation, LLM calls)
Cache invalidation strategy documented
Eviction policy set (usually allkeys-lru)
Memory usage monitored
Replication configured for HA

HTTP Caching

Cache-Control headers set correctly
ETag or Last-Modified for revalidation
Public vs private cached content
No caching private data

CDN / Edge Caching

Static assets served from CDN
Cache headers optimized per content type
Purge strategy for updates
Monitor cache hit ratio (< 80% = problem)

Network Layer

HTTP/2 or HTTP/3

Upgrade from HTTP/1.1
Multiplexing working (faster page loads)
Server push configured (optional)
QUIC/HTTP/3 if infrastructure supports

Compression

Gzip or Brotli enabled
Compression ratio monitored
CPU cost of compression vs bandwidth savings balanced

Connection Reuse

Keep-alive enabled (requests don''t reconnect)
Connection pooling on the client side
TLS session resumption working

DNS Performance

DNS resolution time < 50ms
DNS caching on client
Separate domain for static assets (parallel connections)

AI Performance

Semantic Caching

Identical queries cached (same question < 100 tokens apart)
Cache hit rate monitored
Saves ~80% on embedding+LLM costs

Streaming Responses

Token-level streaming (not buffering full response)
Client receives first token < 200ms
Reduces perceived latency dramatically

Async LLM Calls

LLM inference doesn''t block main request
Queue for async processing
Poll for results or webhook callback
Timeout configured (don''t wait forever)

Model Selection

Right-sized model for task (Llama 3 7B often beats GPT-4 for cost)
Quantization reduces latency (Q4_K_M is usually good)
Batch requests when possible
Smaller models for lower-latency paths

Prompt Optimization

Prompts are as short as possible (fewer tokens = faster)
Few-shot examples included only if needed
Avoid over-engineering prompts

Observability for Performance

Latency Percentiles

Track p50, p95, p99 latency
Not just averages (average hides problems)
Disaggregated by endpoint
Alerts on p95 < 1s (or your SLA)

Throughput

Requests per second by endpoint
Concurrent connections monitored
Bottlenecks identified (database, cache, network)

Resource Utilization

CPU usage < 70% (headroom for spikes)
Memory usage stable
Disk I/O monitored
Network bandwidth headroom

Distributed Tracing

End-to-end request flow traced
Latency breakdown by service/operation
Bottleneck identification
Example: Datadog APM, Jaeger

Performance Budget Per Layer

Define acceptable latency per layer:

Total API latency budget: 100ms

Database query: 20ms
Cache lookup: 5ms
Application logic: 30ms
External API calls: 30ms
Serialization: 15ms

If database is slow, you don''t have 30ms for logic. Enforce budgets in code reviews.

Continuous Performance Testing in CI

Load Testing

Run before deployment (baseline)
Compare to previous version
Fail if performance degrades < 10%
Tools: k6, JMeter, Locust

Benchmark Suite

Critical path benchmarks
Run on every PR
Track results over time
Alert on regressions

Example (k6 load test):

import http from 'k6/http';
import { check } from 'k6';

export let options = {
  vus: 100,
  duration: '30s'
};

export default function () {
  let response = http.get('https://api.example.com/users');
  check(response, {
    'status is 200': (r) => r.status === 200,
    'latency &lt; 100ms': (r) => r.timings.duration &lt; 100,
  });
}

Run in CI on every PR. Fail if latency increases.

Profiling Tools

For Node.js:

node --prof (CPU profiling)
heapdump (memory profiling)
Clinic.js (visualized profiling)

For Databases:

EXPLAIN ANALYZE (query plans)
pg_stat_statements (slow query log)
New Relic or DataDog (APM)

Common Performance Anti-Patterns

Fetching and Discarding

// Bad: fetch all, filter in code
const allUsers = await db.query("SELECT * FROM users");
const active = allUsers.filter(u => u.status === 'active');

// Good: filter in database
const active = await db.query("SELECT * FROM users WHERE status = 'active'");

N+1 Queries

// Bad: 1 + N queries
const users = await db.query("SELECT * FROM users");
for (const user of users) {
  const posts = await db.query("SELECT * FROM posts WHERE user_id = ?", user.id);
}

// Good: 2 queries
const users = await db.query("SELECT * FROM users");
const posts = await db.query("SELECT * FROM posts WHERE user_id IN (?)", [users.map(u => u.id)]);

Serializing Large Objects

// Bad: serializing is slow
const json = JSON.stringify(largeObject);

// Good: stream or chunk
response.write(JSON.stringify(largeObject));

Checklist

Before Optimization:

Measured baseline (p50, p95, p99 latency)
Identified bottleneck (database? cache? network?)
Set performance target (e.g., p95 < 100ms)

Database:

Index coverage complete
Connection pooling enabled
Slow queries optimized
Read replicas for heavy reads

Application:

Async I/O throughout
Streaming for large responses
No memory leaks
Worker threads for CPU work

Caching:

In-process cache for hot data
Redis for expensive operations
HTTP caching headers set
CDN configured

Network:

HTTP/2 enabled
Compression enabled
Keep-alive working
DNS fast

Monitoring:

Percentile latency tracked (p50, p95, p99)
Alerts on SLA breach
Distributed tracing enabled
Performance regressions caught in CI

Conclusion

Performance is systematic. It''s not magic. Database indexing, async I/O, caching, and monitoring get you 90% of the way. The last 10% comes from profiling and targeted optimization.

Use this checklist before starting. Measure first. Optimize what actually matters. Ship faster, happier users.