Load Balancer Algorithms — Round Robin, Least Connections, and Consistent Hashing

Introduction

Load balancers sit between clients and application servers, distributing traffic. Choosing the wrong algorithm wastes capacity—requests pile on idle servers while others overflow. Round-robin treats all servers equally (false), least-connections respects workload (better), and consistent hashing preserves cache affinity (critical for distributed caches). Understanding these algorithms prevents unnecessary infrastructure scaling.

Round-Robin and Its Failure Modes
Least-Connections Load Balancing
Least-Response-Time Algorithm
Consistent Hashing for Cache Affinity
Session Affinity (Sticky Sessions)
Health Checks and Load Balancer Integration
L4 vs L7 Load Balancing
Checklist
Conclusion

Round-Robin and Its Failure Modes

Round-robin distributes requests equally, assuming all servers are equally capable. This assumption breaks in reality:

// Round-robin implementation
class RoundRobinLoadBalancer {
  private servers: string[];
  private currentIndex = 0;

  constructor(servers: string[]) {
    this.servers = servers;
  }

  selectServer(): string {
    const server = this.servers[this.currentIndex];
    this.currentIndex = (this.currentIndex + 1) % this.servers.length;
    return server;
  }
}

// Problem 1: Heterogeneous workloads
// Server 1: 2 CPU cores, 8GB RAM
// Server 2: 4 CPU cores, 16GB RAM
// Round-robin sends equal traffic to both
// Result: Server 1 overloaded, Server 2 underutilized

// Problem 2: Persistent connections (WebSockets)
// Client connects, round-robin assigns to Server 1
// Client 2 connects, round-robin assigns to Server 2
// Client sends second request on same connection
// If using HTTP keep-alive, connection goes to Server 1 (correct)
// If client opens new connection, might route to Server 3 (cache miss)

// Problem 3: Long-running requests
// Heavy processing request lands on Server 1
// Next 10 requests all go to Servers 2-5
// Server 1 still processing, next request still goes to Server 2
// Server 1 continues being assigned requests while overloaded

const servers = ['api1.example.com', 'api2.example.com', 'api3.example.com'];
const lb = new RoundRobinLoadBalancer(servers);

// Requests get distributed as: [0, 1, 2, 0, 1, 2, ...]
const requests = Array.from({ length: 9 }, () => lb.selectServer());
console.log(requests); // [api1, api2, api3, api1, api2, api3, api1, api2, api3]

Least-Connections Load Balancing

Least-connections respects actual server load:

interface ServerMetrics {
  host: string;
  activeConnections: number;
  totalCapacity: number;
  utilizationPercent: number;
}

class LeastConnectionsLoadBalancer {
  private servers: Map<string, ServerMetrics>;

  constructor(
    serverHosts: string[],
    capacities: Map<string, number> = new Map()
  ) {
    this.servers = new Map(
      serverHosts.map(host => [
        host,
        {
          host,
          activeConnections: 0,
          totalCapacity: capacities.get(host) || 1000, // Default capacity
          utilizationPercent: 0,
        },
      ])
    );
  }

  selectServer(): string {
    // Find server with fewest active connections
    let minServer: ServerMetrics | null = null;

    for (const server of this.servers.values()) {
      if (!minServer || server.activeConnections < minServer.activeConnections) {
        minServer = server;
      }
    }

    if (!minServer) throw new Error('No servers available');

    minServer.activeConnections++;
    minServer.utilizationPercent = (minServer.activeConnections / minServer.totalCapacity) * 100;

    return minServer.host;
  }

  releaseConnection(server: string): void {
    const metrics = this.servers.get(server);
    if (metrics && metrics.activeConnections > 0) {
      metrics.activeConnections--;
      metrics.utilizationPercent = (metrics.activeConnections / metrics.totalCapacity) * 100;
    }
  }

  getMetrics(): ServerMetrics[] {
    return Array.from(this.servers.values());
  }
}

// Example: 3 servers with different capacities
const capacities = new Map([
  ['api1.example.com', 100],  // 2 CPU
  ['api2.example.com', 200],  // 4 CPU
  ['api3.example.com', 100],  // 2 CPU
]);

const lb = new LeastConnectionsLoadBalancer(
  ['api1.example.com', 'api2.example.com', 'api3.example.com'],
  capacities
);

// 300 requests distributed based on capacity
// api1: ~100, api2: ~200, api3: ~100
const distribution = new Map<string, number>();

for (let i = 0; i < 400; i++) {
  const server = lb.selectServer();
  distribution.set(server, (distribution.get(server) || 0) + 1);
}

console.log('Distribution:', Object.fromEntries(distribution));
// Output:
// api1: 100 (100% utilized)
// api2: 200 (100% utilized)
// api3: 100 (100% utilized)

Least-Response-Time Algorithm

Routes to server with lowest recent latency:

interface ServerHealth {
  host: string;
  responseTime: number; // milliseconds
  activeConnections: number;
  errorCount: number;
  weightedScore: number;
}

class LeastResponseTimeLoadBalancer {
  private servers: Map<string, ServerHealth>;
  private windowSize = 100; // Track last N requests

  constructor(serverHosts: string[]) {
    this.servers = new Map(
      serverHosts.map(host => [
        host,
        {
          host,
          responseTime: 0,
          activeConnections: 0,
          errorCount: 0,
          weightedScore: 0,
        },
      ])
    );
  }

  selectServer(): string {
    let bestServer: ServerHealth | null = null;

    for (const server of this.servers.values()) {
      // Weighted score: response time + active connections
      // Heavily penalizes slow or overloaded servers
      const score = server.responseTime * (1 + server.activeConnections / 100);

      server.weightedScore = score;

      if (!bestServer || score < bestServer.weightedScore) {
        bestServer = server;
      }
    }

    if (!bestServer) throw new Error('No servers available');

    bestServer.activeConnections++;
    return bestServer.host;
  }

  recordResponse(server: string, responseTime: number, success: boolean): void {
    const metrics = this.servers.get(server);
    if (!metrics) return;

    // Exponential moving average for response time
    metrics.responseTime = metrics.responseTime * 0.8 + responseTime * 0.2;

    if (success) {
      metrics.errorCount = Math.max(0, metrics.errorCount - 1);
    } else {
      metrics.errorCount++;
    }

    metrics.activeConnections = Math.max(0, metrics.activeConnections - 1);
  }

  getMetrics(): ServerHealth[] {
    return Array.from(this.servers.values());
  }
}

// Usage
const lrtLb = new LeastResponseTimeLoadBalancer([
  'api1.example.com',
  'api2.example.com',
  'api3.example.com',
]);

// api1 is slow (200ms response time)
lrtLb.recordResponse('api1.example.com', 200, true);

// api2 is fast (50ms response time)
lrtLb.recordResponse('api2.example.com', 50, true);

// api3 is medium (100ms response time)
lrtLb.recordResponse('api3.example.com', 100, true);

// Next request routes to api2 (lowest response time)
const selected = lrtLb.selectServer();
console.log('Selected:', selected); // api2.example.com

Consistent Hashing for Cache Affinity

Distributes requests to same server based on client/key, preserving cache locality:

class ConsistentHashRing {
  private ring = new Map<number, string>();
  private sortedKeys: number[] = [];
  private virtualNodes = 150; // Replicas per server
  private servers = new Set<string>();

  private hash(key: string): number {
    // MurmurHash or similar for better distribution
    let hash = 0;
    for (let i = 0; i < key.length; i++) {
      const char = key.charCodeAt(i);
      hash = ((hash << 5) - hash) + char;
      hash = hash & hash; // Convert to 32-bit integer
    }
    return Math.abs(hash);
  }

  addServer(server: string): void {
    this.servers.add(server);

    // Add virtual nodes
    for (let i = 0; i < this.virtualNodes; i++) {
      const virtualKey = `${server}:${i}`;
      const hash = this.hash(virtualKey);
      this.ring.set(hash, server);
    }

    // Re-sort keys
    this.sortedKeys = Array.from(this.ring.keys()).sort((a, b) => a - b);
  }

  removeServer(server: string): void {
    this.servers.delete(server);

    // Remove virtual nodes
    for (let i = 0; i < this.virtualNodes; i++) {
      const virtualKey = `${server}:${i}`;
      const hash = this.hash(virtualKey);
      this.ring.delete(hash);
    }

    this.sortedKeys = Array.from(this.ring.keys()).sort((a, b) => a - b);
  }

  getServer(key: string): string | null {
    if (this.ring.size === 0) return null;

    const hash = this.hash(key);

    // Find next server clockwise from hash
    for (const ringKey of this.sortedKeys) {
      if (ringKey >= hash) {
        return this.ring.get(ringKey) || null;
      }
    }

    // Wrap around to first server
    return this.ring.get(this.sortedKeys[0]) || null;
  }

  // When server is added/removed, keys redistribute minimally
  keysMoved(oldRing: ConsistentHashRing, newRing: ConsistentHashRing): number {
    let moved = 0;
    const testKeys = Array.from({ length: 1000 }, (_, i) => `key:${i}`);

    for (const testKey of testKeys) {
      if (oldRing.getServer(testKey) !== newRing.getServer(testKey)) {
        moved++;
      }
    }

    return moved;
  }
}

// Example: Cache affinity for Redis cluster
const hashRing = new ConsistentHashRing();
hashRing.addServer('redis-1.example.com');
hashRing.addServer('redis-2.example.com');
hashRing.addServer('redis-3.example.com');

// User 123's data always goes to same Redis node
const userCacheKey = `user:123`;
const cacheNode = hashRing.getServer(userCacheKey);

// Later, user 456's request
const userCacheKey2 = `user:456`;
const cacheNode2 = hashRing.getServer(userCacheKey2);

// Benefit: If redis-1 goes down, only keys mapping to redis-1 rehash
// Keys mapping to redis-2 and redis-3 unaffected (locality preserved)

Session Affinity (Sticky Sessions)

WebSocket and stateful connections require route to same server:

class SessionAffinityLoadBalancer {
  private servers: string[];
  private sessionToServer = new Map<string, string>();
  private serverLoads = new Map<string, number>();

  constructor(servers: string[]) {
    this.servers = servers;
    servers.forEach(s => this.serverLoads.set(s, 0));
  }

  selectServer(sessionId?: string): string {
    // If session exists, return same server
    if (sessionId && this.sessionToServer.has(sessionId)) {
      return this.sessionToServer.get(sessionId)!;
    }

    // Otherwise, pick server with lowest load
    let minServer = this.servers[0];
    let minLoad = this.serverLoads.get(minServer) || 0;

    for (const server of this.servers) {
      const load = this.serverLoads.get(server) || 0;
      if (load < minLoad) {
        minServer = server;
        minLoad = load;
      }
    }

    // Store session affinity
    if (sessionId) {
      this.sessionToServer.set(sessionId, minServer);
    }

    // Update load
    this.serverLoads.set(minServer, minLoad + 1);

    return minServer;
  }

  releaseSession(sessionId: string): void {
    const server = this.sessionToServer.get(sessionId);
    if (server) {
      const load = (this.serverLoads.get(server) || 1) - 1;
      this.serverLoads.set(server, load);
      this.sessionToServer.delete(sessionId);
    }
  }

  handleServerDown(server: string): void {
    // Reassign all sessions from failed server
    const affectedSessions: string[] = [];

    for (const [sessionId, assignedServer] of this.sessionToServer.entries()) {
      if (assignedServer === server) {
        affectedSessions.push(sessionId);
      }
    }

    // Reassign to healthy servers
    for (const sessionId of affectedSessions) {
      this.sessionToServer.delete(sessionId);
      // Clients will reconnect with new session
    }
  }
}

// WebSocket example
import { Server } from 'ws';
import express from 'express';

const app = express();
const lb = new SessionAffinityLoadBalancer([
  'ws://server1:8080',
  'ws://server2:8080',
  'ws://server3:8080',
]);

app.get('/connect', (req, res) => {
  // User connects for WebSocket
  const sessionId = req.sessionID; // From session middleware
  const wsServer = lb.selectServer(sessionId);

  res.json({ wsUrl: wsServer, sessionId });
});

// Cookie with routing hint
res.setHeader('Set-Cookie', `X-Server=${wsServer}; Path=/`);

// Nginx example (session affinity via cookie)
// upstream backend {
//   least_conn;
//   server api1.example.com weight=1;
//   server api2.example.com weight=1;
// }
//
// server {
//   location / {
//     proxy_pass http://backend;
//     proxy_cookie_path / "/; Path=/; HTTPOnly";
//   }
// }

Health Checks and Load Balancer Integration

Prevent routing to failed servers:

interface HealthCheckConfig {
  interval: number; // milliseconds
  timeout: number;
  healthyThreshold: number; // consecutive successes before healthy
  unhealthyThreshold: number; // consecutive failures before unhealthy
  path: string;
  expectedStatus: number;
}

class HealthCheckManager {
  private servers: Map<string, { healthy: boolean; failureCount: number }>;
  private config: HealthCheckConfig;

  constructor(servers: string[], config: HealthCheckConfig) {
    this.config = config;
    this.servers = new Map(servers.map(s => [s, { healthy: true, failureCount: 0 }]));
    this.startHealthChecks();
  }

  private startHealthChecks(): void {
    setInterval(() => this.checkAllServers(), this.config.interval);
  }

  private async checkAllServers(): Promise<void> {
    for (const [server, status] of this.servers.entries()) {
      const healthy = await this.checkServer(server);

      if (healthy) {
        status.failureCount = 0;
        if (!status.healthy) {
          status.healthy = true;
          console.log(`Server ${server} recovered`);
        }
      } else {
        status.failureCount++;

        if (status.failureCount >= this.config.unhealthyThreshold && status.healthy) {
          status.healthy = false;
          console.warn(`Server ${server} marked unhealthy (${status.failureCount} failures)`);
        }
      }
    }
  }

  private async checkServer(server: string): Promise<boolean> {
    try {
      const controller = new AbortController();
      const timeout = setTimeout(() => controller.abort(), this.config.timeout);

      const response = await fetch(`http://${server}${this.config.path}`, {
        signal: controller.signal,
      });

      clearTimeout(timeout);
      return response.status === this.config.expectedStatus;
    } catch (error) {
      console.error(`Health check failed for ${server}:`, error);
      return false;
    }
  }

  getHealthyServers(): string[] {
    return Array.from(this.servers.entries())
      .filter(([_, status]) => status.healthy)
      .map(([server]) => server);
  }

  getUnhealthyServers(): string[] {
    return Array.from(this.servers.entries())
      .filter(([_, status]) => !status.healthy)
      .map(([server]) => server);
  }
}

// Usage
const healthManager = new HealthCheckManager(
  ['api1.example.com', 'api2.example.com', 'api3.example.com'],
  {
    interval: 10000,
    timeout: 5000,
    healthyThreshold: 2,
    unhealthyThreshold: 3,
    path: '/health',
    expectedStatus: 200,
  }
);

// Load balancer uses only healthy servers
function selectServer(): string {
  const healthyServers = healthManager.getHealthyServers();
  if (healthyServers.length === 0) {
    throw new Error('No healthy servers available');
  }
  return healthyServers[Math.floor(Math.random() * healthyServers.length)];
}

L4 vs L7 Load Balancing

Different layers provide different benefits:

// L4 (Transport layer): IP + Port based routing
// - Fast (hardware accelerated)
// - Limited routing rules (can't see HTTP headers)
// - Good for: High throughput, simple routing

// L7 (Application layer): HTTP-based routing
// - Slower (software-based)
// - Rich routing (path, headers, host)
// - Good for: Complex rules, session awareness

// L7 example: Route based on path
class HTTPLoadBalancer {
  private routes = new Map<string, string[]>();

  addRoute(pathPattern: string, servers: string[]): void {
    this.routes.set(pathPattern, servers);
  }

  selectServer(path: string, method: string, headers: Record<string, string>): string {
    // Route /api/* to api-pool
    if (path.startsWith('/api')) {
      const apiServers = this.routes.get('/api/*') || [];
      return apiServers[Math.floor(Math.random() * apiServers.length)];
    }

    // Route /static/* to CDN
    if (path.startsWith('/static')) {
      return 'cdn.example.com'; // Actual CDN, not a server
    }

    // Route WebSocket to dedicated servers
    if (headers.upgrade === 'websocket') {
      const wsServers = this.routes.get('/ws') || [];
      return wsServers[Math.floor(Math.random() * wsServers.length)];
    }

    // Default routing
    return this.routes.get('default')?.[0] || 'default.example.com';
  }
}

Checklist

Conclusion

Load balancers are critical infrastructure, but the algorithm matters as much as the hardware. Round-robin works only when servers are identical and requests are homogeneous—rare in production. Least-connections respects real workloads, response-time algorithms adapt to dynamic conditions, and consistent hashing preserves cache locality. Choosing the right algorithm multiplies application capacity without adding servers.