Published on

Missing Database Index — Why Your App Slows Down as It Grows

Authors

Introduction

Indexes are the single most impactful database optimization you can make. A well-placed index turns a 30-second query into a 3-millisecond one. But they're easy to miss: you add a foreign key column, forget to index it, and queries work fine — until the table has 10 million rows and every join becomes a full scan.

How to Find Missing Indexes

PostgreSQL tracks every sequential scan. Queries that do lots of seq scans on big tables are your index candidates:

-- Find tables with lots of sequential scans (potential missing indexes)
SELECT
  schemaname,
  tablename,
  seq_scan,
  seq_tup_read,
  idx_scan,
  n_live_tup AS row_count,
  ROUND(seq_scan::numeric / NULLIF(idx_scan + seq_scan, 0) * 100, 2) AS seq_scan_pct
FROM pg_stat_user_tables
WHERE n_live_tup > 10000      -- only care about big tables
  AND seq_scan > 100          -- being scanned frequently
ORDER BY seq_tup_read DESC    -- worst first
LIMIT 20;
-- Find indexes that are never used (waste space, slow down writes)
SELECT
  schemaname,
  tablename,
  indexname,
  idx_scan AS index_scans
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexname NOT LIKE '%pkey%'  -- skip primary keys
ORDER BY pg_relation_size(indexrelid) DESC;

Common Missing Index Patterns

1. Foreign Keys Without Indexes

-- ❌ Foreign key with no index — every JOIN does a seq scan on orders
CREATE TABLE orders (
  id UUID PRIMARY KEY,
  user_id UUID REFERENCES users(id),  -- No index!
  created_at TIMESTAMPTZ
);

-- Every query like this scans ALL orders to find the user's:
SELECT * FROM orders WHERE user_id = '123'

-- ✅ Index the foreign key
CREATE INDEX ON orders (user_id);
-- Query now: Index Scan — O(log n) instead of O(n)

2. Timestamp Columns Used in Range Queries

-- ❌ No index on created_at — date range query scans everything
SELECT * FROM events WHERE created_at > NOW() - INTERVAL '7 days'

-- ✅ Index the timestamp column
CREATE INDEX ON events (created_at);

-- Or if you always filter on both status AND date:
CREATE INDEX ON events (status, created_at);
-- The composite index covers: WHERE status = 'pending' AND created_at > ...

3. Status/Enum Columns Used in WHERE Clauses

-- ❌ Scanning millions of jobs to find the 100 that are 'pending'
SELECT * FROM jobs WHERE status = 'pending' ORDER BY created_at LIMIT 100

-- ✅ Partial index — only indexes rows where status = 'pending'
-- Much smaller than a full index, faster updates
CREATE INDEX ON jobs (created_at) WHERE status = 'pending';

4. Columns Used in ORDER BY

-- ❌ Sorting 5 million rows without an index — filesort in memory
SELECT * FROM articles ORDER BY published_at DESC LIMIT 20

-- ✅ Index the sort column
CREATE INDEX ON articles (published_at DESC);

-- Or with a filter:
CREATE INDEX ON articles (published_at DESC) WHERE published = true;

Multi-Column Indexes — Order Matters

-- Query: WHERE tenant_id = ? AND status = ? ORDER BY created_at DESC
-- ❌ Wrong index — can't use it for the full query
CREATE INDEX ON orders (created_at, tenant_id, status);

-- ✅ Right order: equality columns first, range/sort column last
CREATE INDEX ON orders (tenant_id, status, created_at DESC);
-- Rules: equality filters first, then range/ORDER BY columns

The "leftmost prefix" rule: a composite index (a, b, c) can be used for queries filtering on a, a + b, or a + b + c — but not b alone or c alone.

Adding Indexes Without Downtime

In PostgreSQL, a regular CREATE INDEX locks the table for writes during the build. On a large table, this can take minutes:

-- ❌ Locks table for writes during index build (minutes on large table)
CREATE INDEX ON orders (user_id);

-- ✅ CONCURRENTLY — builds index without write lock
-- Slower, can't run in a transaction, but doesn't block
CREATE INDEX CONCURRENTLY ON orders (user_id);
// In migrations — always use CONCURRENTLY for large tables
// knex migration example:
exports.up = async (knex) => {
  // Run concurrently to avoid locking production
  await knex.raw('CREATE INDEX CONCURRENTLY IF NOT EXISTS orders_user_id_idx ON orders (user_id)')
}

Automatically Suggested Indexes

PostgreSQL can tell you what indexes it wishes existed:

-- After running a slow query, check pg_stat_user_tables
-- But better: ask the query planner directly

EXPLAIN (FORMAT JSON, BUFFERS)
SELECT * FROM orders WHERE user_id = '123' AND status = 'pending';

-- If you see "Seq Scan" with high cost, the planner is telling you
-- it would prefer an index here

-- Enable pg_stat_statements and look for high mean_exec_time:
SELECT query, mean_exec_time, calls
FROM pg_stat_statements
WHERE mean_exec_time > 100  -- queries averaging > 100ms
ORDER BY mean_exec_time DESC;

Index Bloat — When Indexes Grow Stale

-- Dead rows from UPDATE/DELETE accumulate in indexes
-- Check index bloat:
SELECT
  tablename,
  indexname,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
  idx_scan
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;

-- Rebuild bloated indexes (CONCURRENTLY to avoid locks):
REINDEX INDEX CONCURRENTLY orders_user_id_idx;

-- Or autovacuum handles this automatically if configured correctly

Missing Index Checklist

  • ✅ Index every foreign key column
  • ✅ Index columns used in WHERE, ORDER BY, GROUP BY on large tables
  • ✅ Use partial indexes for low-cardinality status columns
  • ✅ Use composite indexes with equality columns first, range/sort last
  • ✅ Use CREATE INDEX CONCURRENTLY in production migrations
  • ✅ Run EXPLAIN ANALYZE on slow queries to confirm index usage
  • ✅ Check pg_stat_user_tables regularly for high seq_scan counts
  • ✅ Drop unused indexes (idx_scan = 0) to reduce write overhead

Conclusion

Missing indexes are the most common cause of "it was fast in dev, slow in prod" — the data volume difference hides the problem. The fix is usually two SQL lines: find the missing index with EXPLAIN ANALYZE or pg_stat_user_tables, then CREATE INDEX CONCURRENTLY. The hardest part is knowing which columns to index: foreign keys always, timestamp columns used in range queries, status columns with partial indexes, and ORDER BY columns. Set up slow query logging and check pg_stat_statements weekly — you'll catch problems long before users do.