Published on

Race Conditions in Microservices — When Two Services Agree on Something Wrong

Authors

Introduction

Two users click "Buy" at the same moment. Your inventory service reads stock: 1 remaining. Both requests see it. Both decrement. Stock goes to -1. You've oversold.

In a single process, you'd use a mutex. Across distributed services, you need database-level locking or optimistic concurrency.

Common Race Condition Patterns

1. Check-Then-Act (TOCTOU)

// ❌ Read-then-write — classic TOCTOU race
async function purchaseItem(userId: string, itemId: string) {
  // t=0: Both requests read stock = 1
  const item = await db.item.findById(itemId)

  if (item.stock <= 0) {
    throw new Error('Out of stock')
  }

  // t=1: Both see stock > 0, both proceed
  await db.item.update(itemId, { stock: item.stock - 1 })
  // t=2: Both write stock = 0 (second write wins)
  // Actual stock = -1 (oversold!)
}

2. Lost Update

// ❌ Both read balance = $100
// Both add $50
// Both write $150
// Net result: $50 lost (should be $200)

const balance = await db.wallet.findById(userId)
await db.wallet.update(userId, { balance: balance.amount + 50 })

Fix 1: Atomic Database Operations

// ✅ Atomic decrement — database handles concurrency
async function purchaseItem(userId: string, itemId: string) {
  const updated = await db.raw(`
    UPDATE items
    SET stock = stock - 1
    WHERE id = ? AND stock > 0
    RETURNING id, stock
  `, [itemId])

  if (updated.rows.length === 0) {
    throw new OutOfStockError(`Item ${itemId} is out of stock`)
  }

  // Stock never goes below 0 — database guarantees atomicity
  return updated.rows[0]
}

// ✅ Atomic increment (wallet credit)
await db.raw(`
  UPDATE wallets
  SET balance = balance + ?
  WHERE user_id = ?
`, [amount, userId])
// Database applies this as a single atomic operation

Fix 2: Optimistic Locking

// ✅ Read with version, write with version check
// If version changed between read and write → someone else modified it → retry

async function purchaseWithOptimisticLock(userId: string, itemId: string) {
  const MAX_RETRIES = 3

  for (let attempt = 0; attempt < MAX_RETRIES; attempt++) {
    const item = await db.item.findById(itemId)  // Includes `version` field

    if (item.stock <= 0) throw new OutOfStockError()

    // Update only if version hasn't changed
    const updated = await db.raw(`
      UPDATE items
      SET stock = ?, version = version + 1
      WHERE id = ? AND version = ?
    `, [item.stock - 1, itemId, item.version])

    if (updated.rowCount > 0) {
      return { success: true, remainingStock: item.stock - 1 }
    }

    // Version changed — someone else modified the item — retry
    const delay = 50 * Math.pow(2, attempt) + Math.random() * 50
    await sleep(delay)
  }

  throw new ConcurrentModificationError(`Failed after ${MAX_RETRIES} attempts`)
}

Fix 3: Pessimistic Locking (SELECT FOR UPDATE)

// ✅ Lock the row exclusively — other transactions must wait

async function purchaseWithPessimisticLock(userId: string, itemId: string) {
  return db.transaction(async (trx) => {
    // Lock the row — no other transaction can read OR write until we commit
    const item = await trx.raw(`
      SELECT id, stock, price
      FROM items
      WHERE id = ?
      FOR UPDATE  -- Exclusive lock
    `, [itemId]).then(r => r.rows[0])

    if (!item) throw new NotFoundError()
    if (item.stock <= 0) throw new OutOfStockError()

    // Safe to update — we hold the lock
    await trx('items').where({ id: itemId }).decrement('stock', 1)

    const order = await trx('orders').insert({
      user_id: userId,
      item_id: itemId,
      price: item.price,
      created_at: new Date(),
    }).returning('*')

    return order[0]
    // Lock released on transaction commit
  })
}

Fix 4: Distributed Lock with Redis

// When the resource spans multiple services or databases

import { Redis } from 'ioredis'

class DistributedLock {
  constructor(private redis: Redis) {}

  async withLock<T>(
    resource: string,
    ttlMs: number,
    fn: () => Promise<T>
  ): Promise<T> {
    const lockKey = `lock:${resource}`
    const lockValue = Math.random().toString(36)

    // Acquire lock — SET NX with TTL
    const acquired = await this.redis.set(
      lockKey, lockValue, 'PX', ttlMs, 'NX'
    )

    if (!acquired) {
      throw new LockNotAcquiredError(`Resource ${resource} is locked`)
    }

    try {
      return await fn()
    } finally {
      // Release only if we own the lock (Lua script for atomicity)
      await this.redis.eval(`
        if redis.call('get', KEYS[1]) == ARGV[1] then
          return redis.call('del', KEYS[1])
        else
          return 0
        end
      `, 1, lockKey, lockValue)
    }
  }
}

const lock = new DistributedLock(redis)

async function reserveFlightSeat(flightId: string, seatNumber: string, userId: string) {
  return lock.withLock(`flight:${flightId}:seat:${seatNumber}`, 5000, async () => {
    // Only one thread/server holds this lock at a time
    const seat = await db.seat.findOne({ flightId, seatNumber })
    if (seat.reserved) throw new SeatTakenError()

    await db.seat.update({ flightId, seatNumber }, { reserved: true, userId })
    return { seatNumber, userId }
  })
}

Fix 5: Saga Pattern for Multi-Service Transactions

// When a transaction spans multiple microservices
// Use saga with compensating transactions

class PurchaseSaga {
  async execute(order: Order) {
    const steps: SagaStep[] = [
      {
        name: 'reserve_inventory',
        execute: () => inventoryService.reserve(order.itemId, order.quantity),
        compensate: () => inventoryService.release(order.itemId, order.quantity),
      },
      {
        name: 'charge_payment',
        execute: () => paymentService.charge(order.userId, order.total),
        compensate: () => paymentService.refund(order.paymentId),
      },
      {
        name: 'create_shipment',
        execute: () => shippingService.schedule(order),
        compensate: () => shippingService.cancel(order.shipmentId),
      },
    ]

    const completed: SagaStep[] = []

    for (const step of steps) {
      try {
        await step.execute()
        completed.push(step)
      } catch (err) {
        // Roll back completed steps in reverse order
        console.error(`Saga step ${step.name} failed, compensating...`)
        for (const done of completed.reverse()) {
          await done.compensate()
        }
        throw err
      }
    }
  }
}

Race Condition Prevention Cheatsheet

ScenarioSolution
Single-row counter (inventory, seats)Atomic SQL: WHERE stock > 0
Complex read-modify-writeOptimistic locking with version number
High contention + performance criticalPessimistic SELECT FOR UPDATE
Cross-service resource (booking, allocation)Distributed Redis lock
Multi-service transactionSaga with compensating transactions
Wallet balanceAppend-only ledger, compute balance on read

Conclusion

Race conditions in distributed systems can't be solved with in-process mutexes — you need the database or a distributed coordination service to provide atomicity. Atomic SQL operations handle simple increment/decrement races. Optimistic locking handles low-contention scenarios where conflicts are rare. Pessimistic locking handles high-contention scenarios where you need predictable latency. Distributed locks coordinate across services. The key insight: never do a read-then-write in separate statements when correctness depends on nothing changing between them — push both operations into a single atomic database statement.