- Published on
Slow Queries That Only Appear at Scale — The Indexing Problem
- Authors

- Name
- Sanjeev Sharma
- @webcoderspeed1
Introduction
"But it worked fine in development!" — famous last words before a production incident.
Your API endpoint hits a PostgreSQL query that returns in 2ms locally (1,000 rows). In production with 10 million rows, the same query takes 8 seconds. Full table scan. Every. Single. Request.
Slow queries that only appear at scale are predictable, preventable, and completely fixable.
- Why Queries Get Slow at Scale
- Step 1: Find Slow Queries
- Step 2: EXPLAIN ANALYZE — See What the DB is Doing
- Fix 1: Add the Right Indexes
- Fix 2: Query Rewriting
- Fix 3: Covering Indexes
- Fix 4: Pagination Without OFFSET
- Fix 5: Partitioning for Very Large Tables
- Index Maintenance
- The Indexing Checklist
- Conclusion
Why Queries Get Slow at Scale
A query without an index performs a sequential scan — it reads every row in the table:
Table: users (10 million rows)
SELECT * FROM users WHERE email = 'alice@example.com';
Without index: reads all 10,000,000 rows → 8 seconds
With index: reads 1 row (B-tree lookup) → 0.2ms
The difference is exponential. At 1,000 rows, the sequential scan takes 1ms — you don't even notice. At 10 million rows, it's catastrophic.
Step 1: Find Slow Queries
-- Enable slow query logging in PostgreSQL
ALTER SYSTEM SET log_min_duration_statement = 100; -- Log queries > 100ms
SELECT pg_reload_conf();
-- Find the worst queries with pg_stat_statements
SELECT
query,
calls,
total_exec_time / 1000 AS total_seconds,
mean_exec_time AS avg_ms,
stddev_exec_time AS stddev_ms,
rows / calls AS avg_rows
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;
-- Find tables with sequential scans (red flag!)
SELECT
schemaname,
tablename,
seq_scan,
seq_tup_read,
idx_scan,
idx_tup_fetch
FROM pg_stat_user_tables
ORDER BY seq_scan DESC;
Step 2: EXPLAIN ANALYZE — See What the DB is Doing
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT u.*, o.total
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE u.email LIKE '%gmail.com'
AND o.created_at > NOW() - INTERVAL '30 days'
ORDER BY o.total DESC
LIMIT 50;
Read the output:
Gather (cost=1000.00..80000.00 rows=500 width=120) (actual time=5234.0..8120.5 rows=50 loops=1)
-> Seq Scan on users (cost=0..79000.00 rows=10000000) (actual time=0.1..7890.0)
Filter: (email ~~ '%gmail.com')
Rows Removed by Filter: 9999950 ← 10M rows scanned, 50 used
Seq Scan on a large table = missing index. Rows Removed by Filter: 9,999,950 = this query is killing your DB.
Fix 1: Add the Right Indexes
-- Single column index
CREATE INDEX idx_users_email ON users (email);
-- Composite index (column order matters — most selective first)
CREATE INDEX idx_orders_user_date ON orders (user_id, created_at DESC);
-- Partial index (index only the rows you query)
CREATE INDEX idx_users_active ON users (email)
WHERE is_active = true; -- Only indexes active users!
-- Expression index
CREATE INDEX idx_users_email_lower ON users (lower(email));
-- Now: WHERE lower(email) = lower('Alice@Example.com') uses the index
-- Index for LIKE prefix searches (works for 'alice%', NOT '%alice')
CREATE INDEX idx_users_name_prefix ON users (name varchar_pattern_ops);
After adding indexes, verify with EXPLAIN:
EXPLAIN SELECT * FROM users WHERE email = 'alice@example.com';
-- Should show: Index Scan on idx_users_email
Fix 2: Query Rewriting
Sometimes the query itself is the problem:
-- ❌ LIKE with leading wildcard — CANNOT use index
WHERE email LIKE '%gmail.com'
-- Full table scan always, even with index on email
-- ✅ FIX: Use full-text search or reverse index
CREATE INDEX idx_users_email_reverse ON users (reverse(email));
WHERE reverse(email) LIKE reverse('%gmail.com')
-- = WHERE reverse(email) LIKE 'moc.liamg%' ← Can use index!
-- Or use PostgreSQL full-text search
CREATE INDEX idx_users_email_text ON users USING gin(to_tsvector('english', email));
WHERE to_tsvector('english', email) @@ to_tsquery('gmail');
-- ❌ Function on indexed column breaks index usage
WHERE YEAR(created_at) = 2026
-- ✅ Use range instead
WHERE created_at >= '2026-01-01' AND created_at < '2027-01-01'
-- ❌ Implicit type cast
WHERE user_id = '123' -- user_id is int, '123' is string → cast → no index
-- ✅ Match types exactly
WHERE user_id = 123 -- Integer literal
Fix 3: Covering Indexes
A covering index includes all columns needed by a query — the DB never touches the actual table:
-- Query fetches id, name, email, status
SELECT id, name, email, status FROM users WHERE status = 'active' LIMIT 100;
-- Regular index: reads index to find rows, then reads table for other columns
CREATE INDEX idx_users_status ON users (status);
-- Covering index: all needed columns are IN the index — no table read!
CREATE INDEX idx_users_status_covering ON users (status)
INCLUDE (id, name, email);
-- PostgreSQL 11+: INCLUDE adds non-key columns to the index
-- Much faster — index-only scan
EXPLAIN SELECT id, name, email, status FROM users WHERE status = 'active';
-- Shows: Index Only Scan on idx_users_status_covering
Fix 4: Pagination Without OFFSET
OFFSET at large values is another scale trap:
-- ❌ OFFSET becomes slower with each page
SELECT * FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 100000;
-- PostgreSQL reads 100,020 rows and discards 100,000 — O(n)!
-- ✅ Cursor-based pagination — always O(1)
-- First page
SELECT * FROM posts ORDER BY created_at DESC LIMIT 20;
-- Returns last post's created_at = '2026-03-10T10:00:00'
-- Next page — use last seen value as cursor
SELECT * FROM posts
WHERE created_at < '2026-03-10T10:00:00'
ORDER BY created_at DESC
LIMIT 20;
-- Uses index on created_at — lightning fast regardless of depth
Fix 5: Partitioning for Very Large Tables
When a table has 100M+ rows, even indexed queries can be slow. Partitioning splits the table:
-- Partition orders table by month
CREATE TABLE orders (
id BIGSERIAL,
user_id INT,
total DECIMAL,
created_at TIMESTAMPTZ NOT NULL
) PARTITION BY RANGE (created_at);
CREATE TABLE orders_2026_01
PARTITION OF orders
FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
CREATE TABLE orders_2026_02
PARTITION OF orders
FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');
-- Queries with date filter only scan relevant partition
SELECT * FROM orders
WHERE created_at >= '2026-01-01' AND created_at < '2026-02-01'
-- Only scans orders_2026_01 partition — ignores all others!
Index Maintenance
-- Check index usage — are your indexes being used?
SELECT
indexrelname AS index_name,
idx_scan AS times_used,
idx_tup_read AS rows_read,
pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC; -- Unused indexes at top
-- Remove unused indexes (they slow down writes!)
DROP INDEX IF EXISTS idx_users_old_column;
-- Rebuild bloated indexes
REINDEX INDEX CONCURRENTLY idx_users_email;
-- Check index bloat
SELECT
tablename,
indexname,
pg_size_pretty(pg_relation_size(quote_ident(indexname)::regclass)) AS size
FROM pg_indexes
WHERE tablename = 'users'
ORDER BY pg_relation_size(quote_ident(indexname)::regclass) DESC;
The Indexing Checklist
Before your query goes to production, verify:
- ✅
WHEREcolumns have indexes - ✅
JOINcolumns have indexes on both sides - ✅
ORDER BY+LIMITcolumns are indexed together - ✅ No leading wildcard
LIKEpatterns - ✅ No function calls on indexed columns in WHERE
- ✅ Type consistency (no implicit casts)
- ✅ EXPLAIN shows Index Scan, not Seq Scan on large tables
- ✅ Pagination uses cursor, not OFFSET
Conclusion
Slow queries at scale are almost always about missing indexes or query patterns that prevent index usage. Run EXPLAIN ANALYZE on every non-trivial query before it reaches production. Add indexes on all WHERE, JOIN, and ORDER BY columns. Use cursor pagination instead of OFFSET. And monitor pg_stat_statements continuously so slow queries are caught before users feel them.