- Published on
Read Replicas in Production — Offloading Reads, Handling Lag, and Routing Queries
- Authors

- Name
- Sanjeev Sharma
- @webcoderspeed1
Introduction
Read replicas are your first scaling tool: they offload reads from the primary without sharding. Yet many teams deploy replicas and see zero improvement because they mishandle replication lag or hit write bottlenecks. This post covers production replica patterns, lag detection, and knowing when replicas aren't the answer.
- Replication Lag Detection and Monitoring
- Read-After-Write Consistency Problem
- Replica Routing Middleware
- Replica Monitoring and Health Checks
- When Replicas Don't Help (Write-Heavy Workloads)
- ANALYZE on Replicas
- Read Replicas Checklist
- Conclusion
Replication Lag Detection and Monitoring
Replicas lag behind the primary. Detect this lag to avoid serving stale data:
-- Query on REPLICA to measure lag
SELECT
now() - pg_last_wal_receive_time() AS replication_lag,
now() - pg_last_xact_replay_time() AS replay_lag;
-- Expected output:
-- replication_lag: 00:00:00.123 (123ms lag)
-- replay_lag: 00:00:00.456 (456ms lag)
-- On PRIMARY: monitor replica progress
SELECT
application_name,
client_addr,
state,
write_lag,
flush_lag,
replay_lag,
sync_state
FROM pg_stat_replication;
-- Typical output:
-- write_lag: 00:00:00.012 (data sent but not flushed to disk)
-- flush_lag: 00:00:00.045 (data flushed but not replayed)
-- replay_lag: 00:00:00.234 (lag between primary and replica statement execution)
-- Set up continuous monitoring
CREATE TABLE replica_lag_history (
measured_at TIMESTAMP DEFAULT NOW(),
replica_name TEXT,
lag_ms BIGINT
);
-- Background job (e.g., cron every 10 seconds)
INSERT INTO replica_lag_history (replica_name, lag_ms)
SELECT
application_name,
EXTRACT(EPOCH FROM replay_lag) * 1000
FROM pg_stat_replication;
Monitor lag continuously. Alert when any replica exceeds your SLA (typically 1-5 seconds).
Read-After-Write Consistency Problem
Users expect to see their own writes immediately. Replicas break this guarantee:
// PROBLEMATIC: User writes to primary, reads from replica
async function updateUserProfile(userId: string, newName: string) {
// 1. Write to primary
await primaryDb.query(
'UPDATE users SET name = $1 WHERE id = $2',
[newName, userId]
);
// 2. Immediately read from replica (might not have replicated yet)
const user = await replicaDb.query(
'SELECT * FROM users WHERE id = $1',
[userId]
);
// user.name might still be old value if replica lagging
return user.rows[0];
}
// SOLUTION 1: Pin to primary after write
async function updateUserProfile(userId: string, newName: string) {
await primaryDb.query(
'UPDATE users SET name = $1 WHERE id = $2',
[newName, userId]
);
// Read from PRIMARY, not replica
const user = await primaryDb.query(
'SELECT * FROM users WHERE id = $1',
[userId]
);
return user.rows[0];
}
// SOLUTION 2: Session-based pinning (Postgres supports)
async function updateUserProfile(userId: string, newName: string, session: Session) {
await primaryDb.query(
'UPDATE users SET name = $1 WHERE id = $2',
[newName, userId]
);
// Mark session to use primary for next 30 seconds
session.pinnedToPrimaryUntil = Date.now() + 30 * 1000;
// Subsequent queries from this session use primary
const db = isSessionPinned(session) ? primaryDb : replicaDb;
const user = await db.query(
'SELECT * FROM users WHERE id = $1',
[userId]
);
return user.rows[0];
}
// SOLUTION 3: Write-through cache
// Always read from cache after write, cache is always fresh
async function updateUserProfile(userId: string, newName: string) {
// Write to primary
const result = await primaryDb.query(
'UPDATE users SET name = $1 WHERE id = $2 RETURNING *',
[newName, userId]
);
// Update cache immediately
const user = result.rows[0];
await cache.set(`user:${userId}`, user, 3600);
return user;
}
Choose pinning or write-through cache. Don't read from replicas immediately after writes.
Replica Routing Middleware
Route reads to replicas, writes to primary, transparently:
import { Pool } from 'pg';
class ReplicaRouter {
primaryPool: Pool;
replicaPools: Pool[];
currentReplicaIndex: number = 0;
sessionPins: Map<string, number> = new Map();
constructor(primaryUrl: string, replicaUrls: string[]) {
this.primaryPool = new Pool({ connectionString: primaryUrl });
this.replicaPools = replicaUrls.map(
url => new Pool({ connectionString: url })
);
}
private selectReplica(): Pool {
// Round-robin across replicas
const pool = this.replicaPools[this.currentReplicaIndex];
this.currentReplicaIndex = (this.currentReplicaIndex + 1) % this.replicaPools.length;
return pool;
}
async query(
sql: string,
params: any[] = [],
options: { sessionId?: string; forceWrite?: boolean } = {}
) {
const isWrite = /^\s*(INSERT|UPDATE|DELETE|CREATE|ALTER|DROP)/i.test(sql);
const isSessionPinned = options.sessionId &&
this.sessionPins.get(options.sessionId) > Date.now();
if (isWrite || options.forceWrite || isSessionPinned) {
// Route writes and pinned reads to primary
const client = await this.primaryPool.connect();
try {
// Pin session to primary for 30 seconds after write
if (isWrite && options.sessionId) {
this.sessionPins.set(options.sessionId, Date.now() + 30 * 1000);
}
return await client.query(sql, params);
} finally {
client.release();
}
} else {
// Route read-only queries to replica
const client = await this.selectReplica().connect();
try {
return await client.query(sql, params);
} catch (err) {
// Fallback to primary if replica down
console.warn('Replica failed, falling back to primary', err);
const primaryClient = await this.primaryPool.connect();
try {
return await primaryClient.query(sql, params);
} finally {
primaryClient.release();
}
}
}
}
}
// Usage
const router = new ReplicaRouter(
'postgresql://primary.db',
[
'postgresql://replica-1.db',
'postgresql://replica-2.db',
'postgresql://replica-3.db'
]
);
// Transparent routing
const sessionId = 'user-session-123';
// Write route to primary, pin session
await router.query(
'UPDATE users SET name = $1 WHERE id = $2',
['Alice', 1],
{ sessionId, forceWrite: true }
);
// Read from replica (or primary if session pinned)
const result = await router.query(
'SELECT * FROM users WHERE id = $1',
[1],
{ sessionId }
);
Middleware makes replica routing transparent to application code.
Replica Monitoring and Health Checks
Monitor replica health continuously:
#!/bin/bash
# Monitor replication lag across all replicas
check_replica_lag() {
local replicas=("replica-1.db" "replica-2.db" "replica-3.db")
local lag_threshold_ms=5000
for replica in "${replicas[@]}"; do
lag_ms=$(psql -h "$replica" -c "
SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_time())) * 1000 as lag_ms;
" -t)
if (( $(echo "$lag_ms > $lag_threshold_ms" | bc -l) )); then
echo "ALERT: $replica replication lag ${lag_ms}ms exceeds threshold"
# Notify ops team
fi
done
}
# Monitor replica connection count
check_replica_connections() {
local replicas=("replica-1.db" "replica-2.db" "replica-3.db")
local max_connections=500
for replica in "${replicas[@]}"; do
connections=$(psql -h "$replica" -c "
SELECT count(*) FROM pg_stat_activity;
" -t)
if (( connections > max_connections )); then
echo "ALERT: $replica connection count $connections exceeds $max_connections"
fi
done
}
# Readiness check for load balancer
readiness_probe() {
psql -h localhost -c "SELECT 1" > /dev/null 2>&1
exit $?
}
# Liveness check (replica can accept connections)
liveness_probe() {
timeout 5 psql -h localhost -c "SELECT 1" > /dev/null 2>&1
exit $?
}
# Add to Kubernetes
# livenessProbe:
# exec:
# command:
# - /bin/sh
# - -c
# - ./liveness_probe.sh
# initialDelaySeconds: 30
# periodSeconds: 10
When Replicas Don't Help (Write-Heavy Workloads)
Read replicas scale reads, not writes. Recognize when you need something else:
-- Analyze read vs write ratio
SELECT
schemaname, tablename,
seq_scan + idx_scan AS total_reads,
n_tup_ins + n_tup_upd + n_tup_del AS total_writes,
ROUND(100.0 * (n_tup_ins + n_tup_upd + n_tup_del) /
NULLIF(seq_scan + idx_scan, 0), 1) AS write_to_read_ratio
FROM pg_stat_user_tables
WHERE (seq_scan + idx_scan) > 0
ORDER BY (n_tup_ins + n_tup_upd + n_tup_del) DESC;
-- If write_to_read_ratio > 50, replicas won't help much
-- Example output:
-- orders: 1M reads, 500K writes = 50% write ratio
-- → Replicas handle reads, but primary still saturated by writes
-- Solutions for write-heavy workloads:
-- 1. Write de-duplication: batch writes, reduce frequency
-- 2. Partitioning: split data across multiple tables by shard key
-- 3. Citus: distributed database for horizontal write scaling
-- 4. Message queue: defer non-critical writes (analytics, logging)
-- Example: batch analytics writes to avoid saturation
CREATE TABLE events_batch AS
WITH batched AS (
SELECT
user_id,
event_type,
COUNT(*) AS count,
MAX(created_at) AS latest_time
FROM events_temp
WHERE created_at > NOW() - INTERVAL '1 minute'
GROUP BY user_id, event_type
)
INSERT INTO events_summary (user_id, event_type, count, updated_at)
SELECT * FROM batched
ON CONFLICT (user_id, event_type)
DO UPDATE SET count = count + EXCLUDED.count, updated_at = NOW();
ANALYZE on Replicas
Keep statistics current on replicas:
-- Run ANALYZE on replica (read-only, doesn't affect primary)
ANALYZE;
-- ANALYZE specific tables
ANALYZE users;
ANALYZE orders;
-- Verify statistics updated
SELECT
schemaname, tablename,
last_vacuum, last_autovacuum, last_analyze, last_autoanalyze
FROM pg_stat_user_tables
ORDER BY last_analyze DESC;
-- Note: VACUUM (not ANALYZE) must run on primary
-- Replicas auto-replay vacuums from primary
ANALYZE on replicas improves query plans without affecting the primary.
Read Replicas Checklist
- Replicas in separate availability zones/regions
- Replication lag monitored (alert if > 5 seconds)
- Session pinning implemented after writes
- Replica routing middleware deployed
- Fallback to primary if replica fails
- Read/write ratio analyzed (replicas help if >70% reads)
- ANALYZE scheduled on replicas
- Replica connection limits monitored
- Write-heavy queries identified (sharding alternative)
- Health checks in load balancer/application
Conclusion
Read replicas are your first scaling lever. They excel when your bottleneck is read throughput, not write throughput. Handle replication lag carefully through session pinning or write-through caching to maintain read-after-write consistency. Monitor lag continuously and implement middleware to route reads/writes transparently. If your workload is write-heavy (>50% writes), replicas won't solve your problem—you'll need partitioning, Citus, or message queues instead.