Published on

Slow Queries That Only Appear at Scale — The Indexing Problem

Authors

Introduction

"But it worked fine in development!" — famous last words before a production incident.

Your API endpoint hits a PostgreSQL query that returns in 2ms locally (1,000 rows). In production with 10 million rows, the same query takes 8 seconds. Full table scan. Every. Single. Request.

Slow queries that only appear at scale are predictable, preventable, and completely fixable.

Why Queries Get Slow at Scale

A query without an index performs a sequential scan — it reads every row in the table:

Table: users (10 million rows)

SELECT * FROM users WHERE email = 'alice@example.com';

Without index: reads all 10,000,000 rows → 8 seconds
With index:    reads 1 row (B-tree lookup)0.2ms

The difference is exponential. At 1,000 rows, the sequential scan takes 1ms — you don't even notice. At 10 million rows, it's catastrophic.

Step 1: Find Slow Queries

-- Enable slow query logging in PostgreSQL
ALTER SYSTEM SET log_min_duration_statement = 100;  -- Log queries > 100ms
SELECT pg_reload_conf();

-- Find the worst queries with pg_stat_statements
SELECT
  query,
  calls,
  total_exec_time / 1000 AS total_seconds,
  mean_exec_time AS avg_ms,
  stddev_exec_time AS stddev_ms,
  rows / calls AS avg_rows
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;

-- Find tables with sequential scans (red flag!)
SELECT
  schemaname,
  tablename,
  seq_scan,
  seq_tup_read,
  idx_scan,
  idx_tup_fetch
FROM pg_stat_user_tables
ORDER BY seq_scan DESC;

Step 2: EXPLAIN ANALYZE — See What the DB is Doing

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT u.*, o.total
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE u.email LIKE '%gmail.com'
  AND o.created_at > NOW() - INTERVAL '30 days'
ORDER BY o.total DESC
LIMIT 50;

Read the output:

Gather  (cost=1000.00..80000.00 rows=500 width=120) (actual time=5234.0..8120.5 rows=50 loops=1)
  ->  Seq Scan on users  (cost=0..79000.00 rows=10000000) (actual time=0.1..7890.0)
      Filter: (email ~~ '%gmail.com')
      Rows Removed by Filter: 9999950    ← 10M rows scanned, 50 used

Seq Scan on a large table = missing index. Rows Removed by Filter: 9,999,950 = this query is killing your DB.

Fix 1: Add the Right Indexes

-- Single column index
CREATE INDEX idx_users_email ON users (email);

-- Composite index (column order matters — most selective first)
CREATE INDEX idx_orders_user_date ON orders (user_id, created_at DESC);

-- Partial index (index only the rows you query)
CREATE INDEX idx_users_active ON users (email)
WHERE is_active = true;  -- Only indexes active users!

-- Expression index
CREATE INDEX idx_users_email_lower ON users (lower(email));
-- Now: WHERE lower(email) = lower('Alice@Example.com') uses the index

-- Index for LIKE prefix searches (works for 'alice%', NOT '%alice')
CREATE INDEX idx_users_name_prefix ON users (name varchar_pattern_ops);

After adding indexes, verify with EXPLAIN:

EXPLAIN SELECT * FROM users WHERE email = 'alice@example.com';
-- Should show: Index Scan on idx_users_email

Fix 2: Query Rewriting

Sometimes the query itself is the problem:

-- ❌ LIKE with leading wildcard — CANNOT use index
WHERE email LIKE '%gmail.com'
-- Full table scan always, even with index on email

-- ✅ FIX: Use full-text search or reverse index
CREATE INDEX idx_users_email_reverse ON users (reverse(email));
WHERE reverse(email) LIKE reverse('%gmail.com')
-- = WHERE reverse(email) LIKE 'moc.liamg%'  ← Can use index!

-- Or use PostgreSQL full-text search
CREATE INDEX idx_users_email_text ON users USING gin(to_tsvector('english', email));
WHERE to_tsvector('english', email) @@ to_tsquery('gmail');
-- ❌ Function on indexed column breaks index usage
WHERE YEAR(created_at) = 2026

-- ✅ Use range instead
WHERE created_at >= '2026-01-01' AND created_at < '2027-01-01'

-- ❌ Implicit type cast
WHERE user_id = '123'  -- user_id is int, '123' is string → cast → no index

-- ✅ Match types exactly
WHERE user_id = 123  -- Integer literal

Fix 3: Covering Indexes

A covering index includes all columns needed by a query — the DB never touches the actual table:

-- Query fetches id, name, email, status
SELECT id, name, email, status FROM users WHERE status = 'active' LIMIT 100;

-- Regular index: reads index to find rows, then reads table for other columns
CREATE INDEX idx_users_status ON users (status);

-- Covering index: all needed columns are IN the index — no table read!
CREATE INDEX idx_users_status_covering ON users (status)
INCLUDE (id, name, email);
-- PostgreSQL 11+: INCLUDE adds non-key columns to the index

-- Much faster — index-only scan
EXPLAIN SELECT id, name, email, status FROM users WHERE status = 'active';
-- Shows: Index Only Scan on idx_users_status_covering

Fix 4: Pagination Without OFFSET

OFFSET at large values is another scale trap:

-- ❌ OFFSET becomes slower with each page
SELECT * FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 100000;
-- PostgreSQL reads 100,020 rows and discards 100,000 — O(n)!

-- ✅ Cursor-based pagination — always O(1)
-- First page
SELECT * FROM posts ORDER BY created_at DESC LIMIT 20;
-- Returns last post's created_at = '2026-03-10T10:00:00'

-- Next page — use last seen value as cursor
SELECT * FROM posts
WHERE created_at < '2026-03-10T10:00:00'
ORDER BY created_at DESC
LIMIT 20;
-- Uses index on created_at — lightning fast regardless of depth

Fix 5: Partitioning for Very Large Tables

When a table has 100M+ rows, even indexed queries can be slow. Partitioning splits the table:

-- Partition orders table by month
CREATE TABLE orders (
  id BIGSERIAL,
  user_id INT,
  total DECIMAL,
  created_at TIMESTAMPTZ NOT NULL
) PARTITION BY RANGE (created_at);

CREATE TABLE orders_2026_01
  PARTITION OF orders
  FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');

CREATE TABLE orders_2026_02
  PARTITION OF orders
  FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');

-- Queries with date filter only scan relevant partition
SELECT * FROM orders
WHERE created_at >= '2026-01-01' AND created_at < '2026-02-01'
-- Only scans orders_2026_01 partition — ignores all others!

Index Maintenance

-- Check index usage — are your indexes being used?
SELECT
  indexrelname AS index_name,
  idx_scan AS times_used,
  idx_tup_read AS rows_read,
  pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;  -- Unused indexes at top

-- Remove unused indexes (they slow down writes!)
DROP INDEX IF EXISTS idx_users_old_column;

-- Rebuild bloated indexes
REINDEX INDEX CONCURRENTLY idx_users_email;

-- Check index bloat
SELECT
  tablename,
  indexname,
  pg_size_pretty(pg_relation_size(quote_ident(indexname)::regclass)) AS size
FROM pg_indexes
WHERE tablename = 'users'
ORDER BY pg_relation_size(quote_ident(indexname)::regclass) DESC;

The Indexing Checklist

Before your query goes to production, verify:

  • WHERE columns have indexes
  • JOIN columns have indexes on both sides
  • ORDER BY + LIMIT columns are indexed together
  • ✅ No leading wildcard LIKE patterns
  • ✅ No function calls on indexed columns in WHERE
  • ✅ Type consistency (no implicit casts)
  • ✅ EXPLAIN shows Index Scan, not Seq Scan on large tables
  • ✅ Pagination uses cursor, not OFFSET

Conclusion

Slow queries at scale are almost always about missing indexes or query patterns that prevent index usage. Run EXPLAIN ANALYZE on every non-trivial query before it reaches production. Add indexes on all WHERE, JOIN, and ORDER BY columns. Use cursor pagination instead of OFFSET. And monitor pg_stat_statements continuously so slow queries are caught before users feel them.