Published on

Accidental Full Table Scan — The Query That Brought Down Production

Authors

Introduction

A full table scan on a 50-million row table doesn't fail — it succeeds, eventually, while consuming all available I/O, blocking other queries, and making your app look like it's down. The worst part: it was fast in staging. The data volume difference between environments hides these bugs completely.

The Symptoms

Database CPU: 100%
Active queries: 1 query running for 4 minutes
Other queries: waiting
App: timing out on all requests
Cause: SELECT * FROM orders WHERE DATE(created_at) = '2026-03-14'

This query runs fine with 10,000 rows. With 50 million rows it reads every single row because DATE(created_at) wraps the indexed column in a function — making the index on created_at unusable.

Cause 1: Function Wrapping an Indexed Column

-- ❌ Wrapping indexed column in a function — index not used
SELECT * FROM orders WHERE DATE(created_at) = '2026-03-14'
-- PostgreSQL cannot use the index on created_at because DATE() is applied first

-- ❌ Same problem
SELECT * FROM users WHERE LOWER(email) = 'user@example.com'
-- Index on email not used because LOWER() transforms the column

-- ✅ Rewrite to let the index work
SELECT * FROM orders
WHERE created_at >= '2026-03-14 00:00:00'
  AND created_at < '2026-03-15 00:00:00'

-- ✅ Or use a functional index that matches the query
CREATE INDEX ON users (LOWER(email));
SELECT * FROM users WHERE LOWER(email) = 'user@example.com'
-- Now the functional index is used

Cause 2: Implicit Type Cast

-- Table: users.phone is VARCHAR
-- ❌ Passing an integer — implicit cast, index not used
SELECT * FROM users WHERE phone = 1234567890

-- ✅ Match the column type exactly
SELECT * FROM users WHERE phone = '1234567890'
// In Node.js — the same trap
// ❌ userId is a number, but the column is VARCHAR
const result = await db.query('SELECT * FROM sessions WHERE user_id = $1', [req.user.id])
// If user_id column is VARCHAR and req.user.id is a number, PostgreSQL casts and may skip index

// ✅ Ensure types match
const result = await db.query('SELECT * FROM sessions WHERE user_id = $1', [String(req.user.id)])

Cause 3: Missing WHERE Clause

// ❌ Bug: filter variable undefined → full scan
async function getOrders(userId?: string) {
  return db.query('SELECT * FROM orders WHERE user_id = $1', [userId])
  // If userId is undefined, PostgreSQL sees: WHERE user_id = NULL
  // Which matches nothing — but the query still scans the table!
}

// ✅ Guard before querying
async function getOrders(userId: string) {
  if (!userId) throw new Error('userId is required')
  return db.query('SELECT * FROM orders WHERE user_id = $1', [userId])
}

Cause 4: OR Clause Breaking Index

-- ❌ OR with a non-indexed column forces full scan
SELECT * FROM orders
WHERE user_id = '123'
   OR notes LIKE '%refund%'  -- notes has no index → full scan

-- ✅ Use UNION instead of OR (if both sides can use indexes separately)
SELECT * FROM orders WHERE user_id = '123'
UNION
SELECT * FROM orders WHERE notes LIKE '%refund%'

-- Or add a GIN index for full-text search on notes:
CREATE INDEX ON orders USING GIN (to_tsvector('english', notes));
SELECT * FROM orders WHERE to_tsvector('english', notes) @@ to_tsquery('refund')

Diagnosing with EXPLAIN ANALYZE

Always run EXPLAIN ANALYZE before deploying queries on large tables:

EXPLAIN ANALYZE
SELECT * FROM orders WHERE DATE(created_at) = '2026-03-14';

-- Output:
-- Seq Scan on orders  (cost=0.00..289432.00 rows=50000 width=156)
--                     (actual time=0.043..42134.521 rows=1247 loops=1)
--   Filter: (date(created_at) = '2026-03-14'::date)
--   Rows Removed by Filter: 49998753
-- Planning Time: 0.4 ms
-- Execution Time: 42187.3 ms   ← 42 seconds, full scan

-- ✅ After fix:
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE created_at >= '2026-03-14' AND created_at < '2026-03-15';

-- Output:
-- Index Scan using orders_created_at_idx on orders
--   (actual time=0.043..1.521 rows=1247 loops=1)
-- Execution Time: 2.1 ms   ← 20,000x faster

Look for: Seq Scan (bad) vs Index Scan / Index Only Scan (good). Rows Removed by Filter tells you how many rows were read and discarded.

Catch Slow Queries in Production

-- Enable slow query logging in PostgreSQL
-- postgresql.conf:
-- log_min_duration_statement = 1000  (log queries > 1 second)

-- Find slow queries via pg_stat_statements
SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
// Node.js: log slow queries automatically
const pool = new Pool({ connectionString: process.env.DATABASE_URL })

pool.on('connect', (client) => {
  const originalQuery = client.query.bind(client)

  client.query = (...args: any[]) => {
    const start = Date.now()
    const result = originalQuery(...args)

    if (result && typeof result.then === 'function') {
      result.then(() => {
        const duration = Date.now() - start
        if (duration > 500) {
          console.warn(`Slow query (${duration}ms):`, args[0])
        }
      })
    }

    return result
  }
})

Full Table Scan Checklist

  • ✅ Run EXPLAIN ANALYZE on every new query before deploying
  • ✅ Never wrap indexed columns in functions (DATE(), LOWER(), etc.)
  • ✅ Ensure parameter types match column types exactly
  • ✅ Guard against undefined parameters that produce WHERE col = NULL
  • ✅ Use pg_stat_statements to find slow queries in production
  • ✅ Set log_min_duration_statement = 1000 to log slow queries
  • ✅ Add indexes on all frequently filtered columns

Conclusion

Full table scans are a ticking time bomb — hidden by small data volumes in dev, exposed when production data grows. The fix for most cases is one of: rewrite the query to avoid function-wrapping the indexed column, match parameter types exactly, or add the right index. EXPLAIN ANALYZE is your best tool: run it before deploying any query that touches a large table. One minute of query analysis can save hours of production incident response.