- Published on
Large Offset Query Slowness — The Export Job That Takes 6 Hours
- Authors

- Name
- Sanjeev Sharma
- @webcoderspeed1
Introduction
When you need to process all rows in a large table — for exports, data migrations, report generation, or ETL pipelines — the naive approach is to paginate with OFFSET. It works for the first few thousand rows, but each subsequent batch has a larger OFFSET, and larger OFFSET means more rows read and discarded. By the end, you're reading millions of rows to process hundreds.
- The Problem
- Fix 1: Keyset Pagination (Cursor-Based) for Bulk Processing
- Fix 2: PostgreSQL Server-Side Cursor (For Very Large Exports)
- Fix 3: Parallel Chunking by ID Range
- Fix 4: COPY for Maximum Export Speed
- Large Offset Checklist
- Conclusion
The Problem
// ❌ OFFSET pagination for bulk processing — gets slower every iteration
async function exportAllOrders() {
const pageSize = 1000
let page = 0
let processed = 0
while (true) {
const offset = page * pageSize
// Batch 1 (offset 0): reads 1,000 rows → 50ms
// Batch 100 (offset 99,000): reads 100,000 rows, discards 99,000 → 2s
// Batch 1000 (offset 999,000): reads 1,000,000 rows, discards 999,000 → 45s
// Batch 10000 (offset 9,999,000): reads 10M rows → 8+ minutes per batch!
const result = await db.query(`
SELECT * FROM orders ORDER BY id LIMIT $1 OFFSET $2
`, [pageSize, offset])
if (result.rows.length === 0) break
await processRows(result.rows)
processed += result.rows.length
page++
}
}
Fix 1: Keyset Pagination (Cursor-Based) for Bulk Processing
// ✅ Cursor-based: always constant time per batch
async function exportAllOrdersCursor() {
const pageSize = 1000
let lastId: string | null = null
let processed = 0
while (true) {
const result = lastId
? await db.query(`
SELECT * FROM orders
WHERE id > $1 -- Only reads the NEXT 1,000 rows, no scanning
ORDER BY id
LIMIT $2
`, [lastId, pageSize])
: await db.query(`
SELECT * FROM orders ORDER BY id LIMIT $1
`, [pageSize])
if (result.rows.length === 0) break
await processRows(result.rows)
lastId = result.rows[result.rows.length - 1].id
processed += result.rows.length
console.log(`Processed ${processed} orders`)
// Every batch takes ~50ms regardless of position — constant time!
}
}
The WHERE id > lastId clause, combined with an index on id, tells PostgreSQL exactly where to start — no rows are read and discarded.
Fix 2: PostgreSQL Server-Side Cursor (For Very Large Exports)
For truly massive exports, use a server-side cursor to stream rows without holding them all in memory:
import { Pool, PoolClient } from 'pg'
async function streamLargeTable(
pool: Pool,
query: string,
processRow: (row: any) => Promise<void>
): Promise<void> {
const client = await pool.connect()
try {
// Open a server-side cursor — PostgreSQL holds the result set
await client.query('BEGIN')
await client.query(`DECLARE export_cursor CURSOR FOR ${query}`)
let done = false
while (!done) {
// Fetch 500 rows at a time from the server cursor
const result = await client.query('FETCH FORWARD 500 FROM export_cursor')
if (result.rows.length === 0) {
done = true
} else {
for (const row of result.rows) {
await processRow(row)
}
}
}
await client.query('CLOSE export_cursor')
await client.query('COMMIT')
} catch (err) {
await client.query('ROLLBACK')
throw err
} finally {
client.release()
}
}
// Usage
await streamLargeTable(
pool,
'SELECT * FROM orders ORDER BY created_at',
async (order) => {
await writeToExportFile(order)
}
)
The server-side cursor is ideal because: it holds the snapshot consistent, rows are fetched in small chunks without the OFFSET problem, and the entire result set isn't loaded into Node.js memory.
Fix 3: Parallel Chunking by ID Range
For maximum throughput, split the table by ID range and process chunks in parallel:
async function parallelExport(parallelism = 4) {
// Get the ID range
const range = await db.query(`
SELECT MIN(id)::text AS min_id, MAX(id)::text AS max_id,
COUNT(*) AS total
FROM orders
`)
const { min_id, max_id, total } = range.rows[0]
// Divide into chunks (using ctid-based splitting for UUID primary keys)
// Or split by the actual ID range for sequential IDs:
const chunkSize = Math.ceil(parseInt(total) / parallelism)
const chunks = await db.query(`
SELECT
MIN(id) AS chunk_start,
MAX(id) AS chunk_end
FROM (
SELECT id, ((ROW_NUMBER() OVER (ORDER BY id) - 1) / $1) AS chunk_num
FROM orders
) numbered
GROUP BY chunk_num
ORDER BY chunk_num
`, [chunkSize])
// Process chunks in parallel
await Promise.all(
chunks.rows.map(({ chunk_start, chunk_end }) =>
processChunk(chunk_start, chunk_end)
)
)
}
async function processChunk(startId: string, endId: string) {
const rows = await db.query(`
SELECT * FROM orders WHERE id >= $1 AND id <= $2 ORDER BY id
`, [startId, endId])
for (const row of rows.rows) {
await processRow(row)
}
}
Fix 4: COPY for Maximum Export Speed
When you need to export the entire table as fast as possible:
import { createWriteStream } from 'fs'
import { pipeline } from 'stream/promises'
async function exportToCsv(outputPath: string) {
const client = await pool.connect()
try {
// COPY is the fastest way to read all rows — bypasses row-by-row overhead
const copyStream = client.query(
`COPY (SELECT * FROM orders ORDER BY created_at) TO STDOUT WITH (FORMAT CSV, HEADER true)`
)
const outputStream = createWriteStream(outputPath)
await pipeline(copyStream, outputStream)
console.log(`Export complete: ${outputPath}`)
} finally {
client.release()
}
}
PostgreSQL's COPY command is 5-10x faster than equivalent SELECT queries for bulk exports — it bypasses much of the query parsing and row serialization overhead.
Large Offset Checklist
- ✅ Replace
OFFSETwith keyset pagination (WHERE id > lastId) for bulk processing - ✅ Use server-side cursors for memory-efficient streaming of large result sets
- ✅ Use
COPY TOfor maximum export throughput - ✅ Split large tables into ID-range chunks for parallel processing
- ✅ Add
created_atoridindex if doing sequential table scans - ✅ Log progress — large jobs should report row count and ETA
- ✅ Run bulk exports as background jobs, not in API request handlers
Conclusion
OFFSET pagination is a hidden performance trap for bulk processing jobs. As the offset grows, each batch gets linearly slower — an export that processes 10 million rows with OFFSET might take hours while cursor-based pagination would take minutes. The fix is to replace ORDER BY id LIMIT N OFFSET M with WHERE id > lastId ORDER BY id LIMIT N. The WHERE clause with an index makes every batch constant-time regardless of position. For maximum throughput, use PostgreSQL's native COPY TO command or a server-side cursor with FETCH FORWARD.