Missing Database Index — Why Your App Slows Down as It Grows

Introduction

Indexes are the single most impactful database optimization you can make. A well-placed index turns a 30-second query into a 3-millisecond one. But they're easy to miss: you add a foreign key column, forget to index it, and queries work fine — until the table has 10 million rows and every join becomes a full scan.

How to Find Missing Indexes
Common Missing Index Patterns
1. Foreign Keys Without Indexes
2. Timestamp Columns Used in Range Queries
3. Status/Enum Columns Used in WHERE Clauses
4. Columns Used in ORDER BY
Multi-Column Indexes — Order Matters
Adding Indexes Without Downtime
Automatically Suggested Indexes
Index Bloat — When Indexes Grow Stale
Missing Index Checklist
Conclusion

How to Find Missing Indexes

PostgreSQL tracks every sequential scan. Queries that do lots of seq scans on big tables are your index candidates:

-- Find tables with lots of sequential scans (potential missing indexes)
SELECT
  schemaname,
  tablename,
  seq_scan,
  seq_tup_read,
  idx_scan,
  n_live_tup AS row_count,
  ROUND(seq_scan::numeric / NULLIF(idx_scan + seq_scan, 0) * 100, 2) AS seq_scan_pct
FROM pg_stat_user_tables
WHERE n_live_tup > 10000      -- only care about big tables
  AND seq_scan > 100          -- being scanned frequently
ORDER BY seq_tup_read DESC    -- worst first
LIMIT 20;

-- Find indexes that are never used (waste space, slow down writes)
SELECT
  schemaname,
  tablename,
  indexname,
  idx_scan AS index_scans
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexname NOT LIKE '%pkey%'  -- skip primary keys
ORDER BY pg_relation_size(indexrelid) DESC;

Common Missing Index Patterns

1. Foreign Keys Without Indexes

-- ❌ Foreign key with no index — every JOIN does a seq scan on orders
CREATE TABLE orders (
  id UUID PRIMARY KEY,
  user_id UUID REFERENCES users(id),  -- No index!
  created_at TIMESTAMPTZ
);

-- Every query like this scans ALL orders to find the user's:
SELECT * FROM orders WHERE user_id = '123'

-- ✅ Index the foreign key
CREATE INDEX ON orders (user_id);
-- Query now: Index Scan — O(log n) instead of O(n)

2. Timestamp Columns Used in Range Queries

-- ❌ No index on created_at — date range query scans everything
SELECT * FROM events WHERE created_at > NOW() - INTERVAL '7 days'

-- ✅ Index the timestamp column
CREATE INDEX ON events (created_at);

-- Or if you always filter on both status AND date:
CREATE INDEX ON events (status, created_at);
-- The composite index covers: WHERE status = 'pending' AND created_at > ...

3. Status/Enum Columns Used in WHERE Clauses

-- ❌ Scanning millions of jobs to find the 100 that are 'pending'
SELECT * FROM jobs WHERE status = 'pending' ORDER BY created_at LIMIT 100

-- ✅ Partial index — only indexes rows where status = 'pending'
-- Much smaller than a full index, faster updates
CREATE INDEX ON jobs (created_at) WHERE status = 'pending';

4. Columns Used in ORDER BY

-- ❌ Sorting 5 million rows without an index — filesort in memory
SELECT * FROM articles ORDER BY published_at DESC LIMIT 20

-- ✅ Index the sort column
CREATE INDEX ON articles (published_at DESC);

-- Or with a filter:
CREATE INDEX ON articles (published_at DESC) WHERE published = true;

Multi-Column Indexes — Order Matters

-- Query: WHERE tenant_id = ? AND status = ? ORDER BY created_at DESC
-- ❌ Wrong index — can't use it for the full query
CREATE INDEX ON orders (created_at, tenant_id, status);

-- ✅ Right order: equality columns first, range/sort column last
CREATE INDEX ON orders (tenant_id, status, created_at DESC);
-- Rules: equality filters first, then range/ORDER BY columns

The "leftmost prefix" rule: a composite index (a, b, c) can be used for queries filtering on a, a + b, or a + b + c — but not b alone or c alone.

Adding Indexes Without Downtime

In PostgreSQL, a regular CREATE INDEX locks the table for writes during the build. On a large table, this can take minutes:

-- ❌ Locks table for writes during index build (minutes on large table)
CREATE INDEX ON orders (user_id);

-- ✅ CONCURRENTLY — builds index without write lock
-- Slower, can't run in a transaction, but doesn't block
CREATE INDEX CONCURRENTLY ON orders (user_id);

// In migrations — always use CONCURRENTLY for large tables
// knex migration example:
exports.up = async (knex) => {
  // Run concurrently to avoid locking production
  await knex.raw('CREATE INDEX CONCURRENTLY IF NOT EXISTS orders_user_id_idx ON orders (user_id)')
}

Automatically Suggested Indexes

PostgreSQL can tell you what indexes it wishes existed:

-- After running a slow query, check pg_stat_user_tables
-- But better: ask the query planner directly

EXPLAIN (FORMAT JSON, BUFFERS)
SELECT * FROM orders WHERE user_id = '123' AND status = 'pending';

-- If you see "Seq Scan" with high cost, the planner is telling you
-- it would prefer an index here

-- Enable pg_stat_statements and look for high mean_exec_time:
SELECT query, mean_exec_time, calls
FROM pg_stat_statements
WHERE mean_exec_time > 100  -- queries averaging > 100ms
ORDER BY mean_exec_time DESC;

Index Bloat — When Indexes Grow Stale

-- Dead rows from UPDATE/DELETE accumulate in indexes
-- Check index bloat:
SELECT
  tablename,
  indexname,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
  idx_scan
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;

-- Rebuild bloated indexes (CONCURRENTLY to avoid locks):
REINDEX INDEX CONCURRENTLY orders_user_id_idx;

-- Or autovacuum handles this automatically if configured correctly

Missing Index Checklist

✅ Index every foreign key column
✅ Index columns used in WHERE, ORDER BY, GROUP BY on large tables
✅ Use partial indexes for low-cardinality status columns
✅ Use composite indexes with equality columns first, range/sort last
✅ Use CREATE INDEX CONCURRENTLY in production migrations
✅ Run EXPLAIN ANALYZE on slow queries to confirm index usage
✅ Check pg_stat_user_tables regularly for high seq_scan counts
✅ Drop unused indexes (idx_scan = 0) to reduce write overhead

Conclusion

Missing indexes are the most common cause of "it was fast in dev, slow in prod" — the data volume difference hides the problem. The fix is usually two SQL lines: find the missing index with EXPLAIN ANALYZE or pg_stat_user_tables, then CREATE INDEX CONCURRENTLY. The hardest part is knowing which columns to index: foreign keys always, timestamp columns used in range queries, status columns with partial indexes, and ORDER BY columns. Set up slow query logging and check pg_stat_statements weekly — you'll catch problems long before users do.