Published on

Cron Job Running Twice — When Your Scheduled Job Has Duplicate Instances

Authors

Introduction

Node-cron, cron-job.org, Kubernetes CronJobs — all have the same problem when you run multiple instances: every instance runs the job. If your cron sends emails, charges subscriptions, or generates reports, running it 3x causes real damage.

The Problem

App deployed on 3 Kubernetes pods
Each pod runs node-cron: '0 0 * * *' (daily at midnight)

00:00:00  Pod 1: starts billing job
00:00:00  Pod 2: starts billing job (DUPLICATE!)
00:00:00  Pod 3: starts billing job (DUPLICATE!)

All 3 pods query unpaid invoices → all see the same 500 invoices
All 3 charge customers → 3x charges
All 3 send receipts → 3x emails

Fix 1: Distributed Lock with Redis

import { Redis } from 'ioredis'
import cron from 'node-cron'

class DistributedCron {
  constructor(private redis: Redis, private instanceId: string = process.env.HOSTNAME ?? 'unknown') {}

  schedule(name: string, cronExpression: string, handler: () => Promise<void>) {
    cron.schedule(cronExpression, async () => {
      const lockKey = `cron:lock:${name}`
      const lockValue = this.instanceId
      const lockTTLSeconds = 300  // 5 minutes max job duration

      // Try to acquire lock — only one instance will succeed
      const acquired = await this.redis.set(
        lockKey,
        lockValue,
        'EX',
        lockTTLSeconds,
        'NX'  // Only set if not exists
      )

      if (!acquired) {
        console.log(`[Cron] ${name} already running on another instance — skipping`)
        return
      }

      const startTime = Date.now()
      console.log(`[Cron] ${name} started on ${this.instanceId}`)

      try {
        await handler()
        console.log(`[Cron] ${name} completed in ${Date.now() - startTime}ms`)
      } catch (err) {
        console.error(`[Cron] ${name} failed:`, err)
        throw err
      } finally {
        // Release lock only if we own it
        await this.redis.eval(`
          if redis.call('get', KEYS[1]) == ARGV[1] then
            return redis.call('del', KEYS[1])
          else
            return 0
          end
        `, 1, lockKey, lockValue)
      }
    })
  }
}

// Usage
const distributedCron = new DistributedCron(redis)

distributedCron.schedule('daily-billing', '0 0 * * *', async () => {
  const invoices = await db.invoice.findUnpaid()
  for (const invoice of invoices) {
    await stripe.charge(invoice)
    await emailService.sendReceipt(invoice)
  }
})
import Bull from 'bull'
import cron from 'node-cron'

// Bull ensures only one instance processes a job at a time
const billingQueue = new Bull('billing', {
  redis: { host: 'localhost', port: 6379 },
})

// Only one instance adds the job (using Redis lock)
// All instances can process it — but Bull ensures only one does
billingQueue.process('monthly-billing', 1, async (job) => {
  console.log(`Processing billing for month ${job.data.month}`)
  const invoices = await db.invoice.findUnpaidForMonth(job.data.month)

  for (const invoice of invoices) {
    await processInvoice(invoice)
    await job.progress(Math.round((invoices.indexOf(invoice) / invoices.length) * 100))
  }
})

// Schedule: only add job if not already queued
cron.schedule('0 1 * * *', async () => {
  const existing = await billingQueue.getJobs(['waiting', 'active', 'delayed'])
  const todaysBilling = existing.find(j =>
    j.name === 'monthly-billing' &&
    new Date(j.data.scheduledAt).toDateString() === new Date().toDateString()
  )

  if (!todaysBilling) {
    await billingQueue.add('monthly-billing', {
      month: new Date().toISOString(),
      scheduledAt: Date.now(),
    })
  }
})

Fix 3: Kubernetes CronJob (One Job at a Time)

# Instead of running cron inside your app, use Kubernetes CronJob
# K8s CronJob spins up a single pod for each run — no duplicates

apiVersion: batch/v1
kind: CronJob
metadata:
  name: daily-billing
spec:
  schedule: "0 0 * * *"
  concurrencyPolicy: Forbid    # Don't start if previous run still active
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 3
  jobTemplate:
    spec:
      backoffLimit: 2          # Retry failed jobs up to 2 times
      activeDeadlineSeconds: 3600  # Kill job if it runs > 1 hour
      template:
        spec:
          restartPolicy: OnFailure
          containers:
            - name: billing
              image: my-app:latest
              command: ["node", "scripts/run-billing.js"]
              env:
                - name: JOB_TYPE
                  value: "billing"
// run-billing.js — standalone script (not long-running server)
async function main() {
  console.log('Starting billing job')

  await db.connect()

  const invoices = await db.invoice.findUnpaid()
  console.log(`Processing ${invoices.length} invoices`)

  for (const invoice of invoices) {
    await processInvoice(invoice)
  }

  console.log('Billing job complete')
  process.exit(0)  // Exit when done — pod terminates
}

main().catch((err) => {
  console.error('Billing job failed:', err)
  process.exit(1)  // Non-zero exit → Kubernetes marks job as failed → retries
})

Fix 4: Database Advisory Locks

// PostgreSQL advisory locks — no Redis required

async function runWithAdvisoryLock(lockId: number, fn: () => Promise<void>) {
  const client = await db.connect()

  try {
    // Try to acquire session-level advisory lock
    // Returns true if acquired, false if already locked
    const acquired = await client.query(
      'SELECT pg_try_advisory_lock($1)',
      [lockId]
    )

    if (!acquired.rows[0].pg_try_advisory_lock) {
      console.log(`Advisory lock ${lockId} already held — skipping`)
      return
    }

    await fn()
  } finally {
    // Lock is released automatically when session ends
    // Or release manually:
    await client.query('SELECT pg_advisory_unlock($1)', [lockId])
    client.release()
  }
}

// Different jobs get different lock IDs
const LOCK_IDS = {
  DAILY_BILLING: 1001,
  WEEKLY_REPORT: 1002,
  CLEANUP: 1003,
}

cron.schedule('0 0 * * *', () => {
  runWithAdvisoryLock(LOCK_IDS.DAILY_BILLING, async () => {
    await processDailyBilling()
  })
})

Fix 5: Leader Election for Master Instance

// Elect one instance as "master" — only master runs cron jobs

class LeaderElection {
  private isLeader = false
  private leaseKey = 'cron:leader'
  private leaseTTL = 30  // 30 seconds
  private renewInterval: NodeJS.Timeout | null = null

  constructor(
    private redis: Redis,
    private instanceId: string = process.env.HOSTNAME ?? 'pod-' + Math.random()
  ) {}

  async start(): Promise<void> {
    await this.tryBecomeLeader()

    // Periodically try to become leader / renew lease
    this.renewInterval = setInterval(() => this.tryBecomeLeader(), 10_000)
  }

  private async tryBecomeLeader(): Promise<void> {
    if (this.isLeader) {
      // Already leader — renew lease
      await this.redis.expire(this.leaseKey, this.leaseTTL)
      return
    }

    // Try to acquire leader lease
    const acquired = await this.redis.set(
      this.leaseKey,
      this.instanceId,
      'EX',
      this.leaseTTL,
      'NX'
    )

    if (acquired) {
      this.isLeader = true
      console.log(`[Leader] ${this.instanceId} is now leader`)
    }
  }

  getIsLeader(): boolean { return this.isLeader }

  stop(): void {
    if (this.renewInterval) clearInterval(this.renewInterval)
  }
}

const election = new LeaderElection(redis)
await election.start()

// Only leader schedules and runs cron jobs
cron.schedule('0 0 * * *', async () => {
  if (!election.getIsLeader()) {
    console.log('Not leader — skipping cron')
    return
  }
  await processDailyBilling()
})

Cron Safety Checklist

RiskSolution
Multiple instances run same jobRedis distributed lock or leader election
Job takes longer than cron intervalconcurrencyPolicy: Forbid in K8s, or lock TTL > max job duration
Job fails — no retryBull queue with retry, or K8s backoffLimit
Long-running job doesn't finishactiveDeadlineSeconds, lock timeout, progress tracking
Lock holder crashes — lock never releasedLock TTL ensures auto-expiry

Conclusion

Running cron in a multi-instance deployment without distributed locking is a guaranteed bug. The simplest fix is a Redis distributed lock: the first instance to set NX runs the job, others skip. For complex workflows, use Bull queue to decouple scheduling from execution — any instance can process the job but only one will at a time. For clean separation, move cron jobs out of the app entirely and into Kubernetes CronJobs with concurrencyPolicy: Forbid. Whatever approach you choose, the lock TTL must be longer than the maximum expected job duration — otherwise the lock expires while the job is still running, enabling a second instance to start.