Published on

Hot Partition in Distributed Databases — When One Shard Gets All the Heat

Authors

Introduction

You horizontally scaled your database to 10 shards to handle load. You expect even distribution. Instead, 90% of all traffic hammers shard #3 while the other 9 nodes sit idle. Writes queue up, latency spikes, and you've added 9 shards of overhead for no benefit.

This is the hot partition problem.

Why Hot Partitions Happen

Distributed databases (DynamoDB, Cassandra, CockroachDB, Redis Cluster) route data to shards based on a partition key. If your partition key has skewed distribution, one shard gets all the traffic:

Bad partition key: user_country = "US"

Shard 1 (US):   ████████████████████  90% of traffic
Shard 2 (EU):   ████  7%
Shard 3 (APAC):3%

Most data + most traffic → shard 1 is HOT

Common causes of hot partitions:

  • Sequential keys (timestamps, auto-increment IDs) → all writes go to the "latest" shard
  • Popular entities (a celebrity's Twitter profile, a viral product listing)
  • Low-cardinality keys (status field with 3 values → max 3 effective shards)
  • Business logic patterns (all orders for a major client go to same partition)

Fix 1: Add Random Salt / Hash Suffix to Keys

Spread a hot key across multiple sub-partitions:

// DynamoDB example: celebrity with 10M followers writing to same partition
const SHARD_COUNT = 20

// ❌ Hot partition: all reads/writes for celebrity go to same partition
const hotKey = `user#${celebId}`

// ✅ Spread reads/writes across 20 sub-partitions
function getShardedKey(userId: string): string {
  const shard = Math.floor(Math.random() * SHARD_COUNT)
  return `user#${userId}#${shard}`
}

// Write: random shard
async function writePost(userId: string, post: Post) {
  const key = getShardedKey(userId)
  await dynamodb.put({ TableName: 'posts', Item: { pk: key, ...post } })
}

// Read all: scatter-gather across all shards
async function getAllPosts(userId: string): Promise<Post[]> {
  const keys = Array.from({ length: SHARD_COUNT }, (_, i) =>
    `user#${userId}#${i}`
  )

  const results = await Promise.all(
    keys.map(key =>
      dynamodb.query({ TableName: 'posts', KeyConditionExpression: 'pk = :pk',
        ExpressionAttributeValues: { ':pk': key } })
    )
  )

  return results.flatMap(r => r.Items as Post[])
}

Fix 2: Time-Based Key with Shard Suffix (Write-Heavy)

For time-series data, sequential timestamps create hot write partitions:

// ❌ Sequential timestamp → all writes go to "latest" partition
const key = `events#${Date.now()}`  // Always the same shard!

// ✅ Append random shard bucket
const WRITE_SHARDS = 16

function getTimeSeriesKey(timestamp: number): string {
  const bucket = timestamp % WRITE_SHARDS  // 0-15
  return `events#${bucket}#${timestamp}`
}

// Cassandra example
async function insertEvent(event: Event) {
  const bucket = event.timestamp % WRITE_SHARDS
  await cassandra.execute(
    'INSERT INTO events (bucket, ts, data) VALUES (?, ?, ?)',
    [bucket, event.timestamp, event.data]
  )
}

// Read: query all buckets in parallel
async function getEvents(startTs: number, endTs: number) {
  const queries = Array.from({ length: WRITE_SHARDS }, (_, i) =>
    cassandra.execute(
      'SELECT * FROM events WHERE bucket = ? AND ts >= ? AND ts <= ?',
      [i, startTs, endTs]
    )
  )
  const results = await Promise.all(queries)
  return results.flatMap(r => r.rows).sort((a, b) => a.ts - b.ts)
}

Fix 3: Caching for Extremely Hot Reads

For celebrity/viral content — cache the hot partition at the application layer:

import LRU from 'lru-cache'

const hotCache = new LRU<string, any>({
  max: 10_000,
  ttl: 5_000,  // 5 second TTL — fresh enough, but protects DB
})

async function getProductListing(productId: string) {
  // Check in-process cache first (no network hop)
  const cached = hotCache.get(productId)
  if (cached) return cached

  // Check Redis
  const redisCached = await redis.get(`product:${productId}`)
  if (redisCached) {
    const data = JSON.parse(redisCached)
    hotCache.set(productId, data)
    return data
  }

  // DB query — this is the hot partition
  const product = await db.products.findById(productId)
  await redis.setex(`product:${productId}`, 30, JSON.stringify(product))
  hotCache.set(productId, product)
  return product
}

Fix 4: DynamoDB Adaptive Capacity (Automatic)

DynamoDB has built-in hot partition mitigation:

// DynamoDB automatically redistributes capacity to hot partitions
// Enable "adaptive capacity" in your table settings

// For application-level help:
const client = new DynamoDBClient({
  retryStrategy: {
    maxAttempts: 5,
    // Exponential backoff handles ProvisionedThroughputExceededException
  }
})

// Monitor for hot partitions
const metrics = await cloudwatch.getMetricStatistics({
  MetricName: 'ThrottledRequests',
  Namespace: 'AWS/DynamoDB',
  // High throttling on one table/partition = hot partition signal
})

Fix 5: Consistent Hashing (Redis Cluster)

Redis Cluster uses 16,384 hash slots. To control which slot a key maps to:

// ❌ These keys land on different nodes — no co-location
const key1 = `user:${userId}:profile`
const key2 = `user:${userId}:feed`

// ✅ Use hash tags {} to force same slot
const key1 = `{user:${userId}}:profile`  // Both hash on userId
const key2 = `{user:${userId}}:feed`     // Same node!

// Now you can MULTI/EXEC across these keys — same slot required

Detecting Hot Partitions

# DynamoDB — check consumed capacity per partition via CloudWatch
aws cloudwatch get-metric-statistics \
  --metric-name ConsumedWriteCapacityUnits \
  --namespace AWS/DynamoDB

# Cassandra — check node load imbalance
nodetool status
# If one node has 60%+ of data → hot partition

# Redis Cluster — check slot distribution
redis-cli --cluster info localhost:7000

# Application-level monitoring
const partitionHitCounts = new Map<string, number>()

function trackPartitionHit(key: string) {
  const partition = getPartitionKey(key)
  const count = (partitionHitCounts.get(partition) ?? 0) + 1
  partitionHitCounts.set(partition, count)

  // Alert if one partition is getting > 50% of traffic
  const total = [...partitionHitCounts.values()].reduce((a, b) => a + b, 0)
  if (count / total > 0.5) {
    logger.warn(`Hot partition detected: ${partition} getting ${((count/total)*100).toFixed(1)}% of traffic`)
  }
}

Conclusion

Hot partitions are a key design problem, not a hardware problem. Adding more shards won't help if all traffic still goes to one. The fixes are: salt your partition keys to spread hot entities, use sharded time-series keys to avoid sequential write hotspots, cache extremely hot reads, and monitor partition distribution continuously. Design your partition key for uniformity — it's the most impactful decision in a distributed database schema.