- Published on
Load Balancer Algorithms — Round Robin, Least Connections, and Consistent Hashing
- Authors

- Name
- Sanjeev Sharma
- @webcoderspeed1
Introduction
Load balancers sit between clients and application servers, distributing traffic. Choosing the wrong algorithm wastes capacity—requests pile on idle servers while others overflow. Round-robin treats all servers equally (false), least-connections respects workload (better), and consistent hashing preserves cache affinity (critical for distributed caches). Understanding these algorithms prevents unnecessary infrastructure scaling.
- Round-Robin and Its Failure Modes
- Least-Connections Load Balancing
- Least-Response-Time Algorithm
- Consistent Hashing for Cache Affinity
- Session Affinity (Sticky Sessions)
- Health Checks and Load Balancer Integration
- L4 vs L7 Load Balancing
- Checklist
- Conclusion
Round-Robin and Its Failure Modes
Round-robin distributes requests equally, assuming all servers are equally capable. This assumption breaks in reality:
// Round-robin implementation
class RoundRobinLoadBalancer {
private servers: string[];
private currentIndex = 0;
constructor(servers: string[]) {
this.servers = servers;
}
selectServer(): string {
const server = this.servers[this.currentIndex];
this.currentIndex = (this.currentIndex + 1) % this.servers.length;
return server;
}
}
// Problem 1: Heterogeneous workloads
// Server 1: 2 CPU cores, 8GB RAM
// Server 2: 4 CPU cores, 16GB RAM
// Round-robin sends equal traffic to both
// Result: Server 1 overloaded, Server 2 underutilized
// Problem 2: Persistent connections (WebSockets)
// Client connects, round-robin assigns to Server 1
// Client 2 connects, round-robin assigns to Server 2
// Client sends second request on same connection
// If using HTTP keep-alive, connection goes to Server 1 (correct)
// If client opens new connection, might route to Server 3 (cache miss)
// Problem 3: Long-running requests
// Heavy processing request lands on Server 1
// Next 10 requests all go to Servers 2-5
// Server 1 still processing, next request still goes to Server 2
// Server 1 continues being assigned requests while overloaded
const servers = ['api1.example.com', 'api2.example.com', 'api3.example.com'];
const lb = new RoundRobinLoadBalancer(servers);
// Requests get distributed as: [0, 1, 2, 0, 1, 2, ...]
const requests = Array.from({ length: 9 }, () => lb.selectServer());
console.log(requests); // [api1, api2, api3, api1, api2, api3, api1, api2, api3]
Least-Connections Load Balancing
Least-connections respects actual server load:
interface ServerMetrics {
host: string;
activeConnections: number;
totalCapacity: number;
utilizationPercent: number;
}
class LeastConnectionsLoadBalancer {
private servers: Map<string, ServerMetrics>;
constructor(
serverHosts: string[],
capacities: Map<string, number> = new Map()
) {
this.servers = new Map(
serverHosts.map(host => [
host,
{
host,
activeConnections: 0,
totalCapacity: capacities.get(host) || 1000, // Default capacity
utilizationPercent: 0,
},
])
);
}
selectServer(): string {
// Find server with fewest active connections
let minServer: ServerMetrics | null = null;
for (const server of this.servers.values()) {
if (!minServer || server.activeConnections < minServer.activeConnections) {
minServer = server;
}
}
if (!minServer) throw new Error('No servers available');
minServer.activeConnections++;
minServer.utilizationPercent = (minServer.activeConnections / minServer.totalCapacity) * 100;
return minServer.host;
}
releaseConnection(server: string): void {
const metrics = this.servers.get(server);
if (metrics && metrics.activeConnections > 0) {
metrics.activeConnections--;
metrics.utilizationPercent = (metrics.activeConnections / metrics.totalCapacity) * 100;
}
}
getMetrics(): ServerMetrics[] {
return Array.from(this.servers.values());
}
}
// Example: 3 servers with different capacities
const capacities = new Map([
['api1.example.com', 100], // 2 CPU
['api2.example.com', 200], // 4 CPU
['api3.example.com', 100], // 2 CPU
]);
const lb = new LeastConnectionsLoadBalancer(
['api1.example.com', 'api2.example.com', 'api3.example.com'],
capacities
);
// 300 requests distributed based on capacity
// api1: ~100, api2: ~200, api3: ~100
const distribution = new Map<string, number>();
for (let i = 0; i < 400; i++) {
const server = lb.selectServer();
distribution.set(server, (distribution.get(server) || 0) + 1);
}
console.log('Distribution:', Object.fromEntries(distribution));
// Output:
// api1: 100 (100% utilized)
// api2: 200 (100% utilized)
// api3: 100 (100% utilized)
Least-Response-Time Algorithm
Routes to server with lowest recent latency:
interface ServerHealth {
host: string;
responseTime: number; // milliseconds
activeConnections: number;
errorCount: number;
weightedScore: number;
}
class LeastResponseTimeLoadBalancer {
private servers: Map<string, ServerHealth>;
private windowSize = 100; // Track last N requests
constructor(serverHosts: string[]) {
this.servers = new Map(
serverHosts.map(host => [
host,
{
host,
responseTime: 0,
activeConnections: 0,
errorCount: 0,
weightedScore: 0,
},
])
);
}
selectServer(): string {
let bestServer: ServerHealth | null = null;
for (const server of this.servers.values()) {
// Weighted score: response time + active connections
// Heavily penalizes slow or overloaded servers
const score = server.responseTime * (1 + server.activeConnections / 100);
server.weightedScore = score;
if (!bestServer || score < bestServer.weightedScore) {
bestServer = server;
}
}
if (!bestServer) throw new Error('No servers available');
bestServer.activeConnections++;
return bestServer.host;
}
recordResponse(server: string, responseTime: number, success: boolean): void {
const metrics = this.servers.get(server);
if (!metrics) return;
// Exponential moving average for response time
metrics.responseTime = metrics.responseTime * 0.8 + responseTime * 0.2;
if (success) {
metrics.errorCount = Math.max(0, metrics.errorCount - 1);
} else {
metrics.errorCount++;
}
metrics.activeConnections = Math.max(0, metrics.activeConnections - 1);
}
getMetrics(): ServerHealth[] {
return Array.from(this.servers.values());
}
}
// Usage
const lrtLb = new LeastResponseTimeLoadBalancer([
'api1.example.com',
'api2.example.com',
'api3.example.com',
]);
// api1 is slow (200ms response time)
lrtLb.recordResponse('api1.example.com', 200, true);
// api2 is fast (50ms response time)
lrtLb.recordResponse('api2.example.com', 50, true);
// api3 is medium (100ms response time)
lrtLb.recordResponse('api3.example.com', 100, true);
// Next request routes to api2 (lowest response time)
const selected = lrtLb.selectServer();
console.log('Selected:', selected); // api2.example.com
Consistent Hashing for Cache Affinity
Distributes requests to same server based on client/key, preserving cache locality:
class ConsistentHashRing {
private ring = new Map<number, string>();
private sortedKeys: number[] = [];
private virtualNodes = 150; // Replicas per server
private servers = new Set<string>();
private hash(key: string): number {
// MurmurHash or similar for better distribution
let hash = 0;
for (let i = 0; i < key.length; i++) {
const char = key.charCodeAt(i);
hash = ((hash << 5) - hash) + char;
hash = hash & hash; // Convert to 32-bit integer
}
return Math.abs(hash);
}
addServer(server: string): void {
this.servers.add(server);
// Add virtual nodes
for (let i = 0; i < this.virtualNodes; i++) {
const virtualKey = `${server}:${i}`;
const hash = this.hash(virtualKey);
this.ring.set(hash, server);
}
// Re-sort keys
this.sortedKeys = Array.from(this.ring.keys()).sort((a, b) => a - b);
}
removeServer(server: string): void {
this.servers.delete(server);
// Remove virtual nodes
for (let i = 0; i < this.virtualNodes; i++) {
const virtualKey = `${server}:${i}`;
const hash = this.hash(virtualKey);
this.ring.delete(hash);
}
this.sortedKeys = Array.from(this.ring.keys()).sort((a, b) => a - b);
}
getServer(key: string): string | null {
if (this.ring.size === 0) return null;
const hash = this.hash(key);
// Find next server clockwise from hash
for (const ringKey of this.sortedKeys) {
if (ringKey >= hash) {
return this.ring.get(ringKey) || null;
}
}
// Wrap around to first server
return this.ring.get(this.sortedKeys[0]) || null;
}
// When server is added/removed, keys redistribute minimally
keysMoved(oldRing: ConsistentHashRing, newRing: ConsistentHashRing): number {
let moved = 0;
const testKeys = Array.from({ length: 1000 }, (_, i) => `key:${i}`);
for (const testKey of testKeys) {
if (oldRing.getServer(testKey) !== newRing.getServer(testKey)) {
moved++;
}
}
return moved;
}
}
// Example: Cache affinity for Redis cluster
const hashRing = new ConsistentHashRing();
hashRing.addServer('redis-1.example.com');
hashRing.addServer('redis-2.example.com');
hashRing.addServer('redis-3.example.com');
// User 123's data always goes to same Redis node
const userCacheKey = `user:123`;
const cacheNode = hashRing.getServer(userCacheKey);
// Later, user 456's request
const userCacheKey2 = `user:456`;
const cacheNode2 = hashRing.getServer(userCacheKey2);
// Benefit: If redis-1 goes down, only keys mapping to redis-1 rehash
// Keys mapping to redis-2 and redis-3 unaffected (locality preserved)
Session Affinity (Sticky Sessions)
WebSocket and stateful connections require route to same server:
class SessionAffinityLoadBalancer {
private servers: string[];
private sessionToServer = new Map<string, string>();
private serverLoads = new Map<string, number>();
constructor(servers: string[]) {
this.servers = servers;
servers.forEach(s => this.serverLoads.set(s, 0));
}
selectServer(sessionId?: string): string {
// If session exists, return same server
if (sessionId && this.sessionToServer.has(sessionId)) {
return this.sessionToServer.get(sessionId)!;
}
// Otherwise, pick server with lowest load
let minServer = this.servers[0];
let minLoad = this.serverLoads.get(minServer) || 0;
for (const server of this.servers) {
const load = this.serverLoads.get(server) || 0;
if (load < minLoad) {
minServer = server;
minLoad = load;
}
}
// Store session affinity
if (sessionId) {
this.sessionToServer.set(sessionId, minServer);
}
// Update load
this.serverLoads.set(minServer, minLoad + 1);
return minServer;
}
releaseSession(sessionId: string): void {
const server = this.sessionToServer.get(sessionId);
if (server) {
const load = (this.serverLoads.get(server) || 1) - 1;
this.serverLoads.set(server, load);
this.sessionToServer.delete(sessionId);
}
}
handleServerDown(server: string): void {
// Reassign all sessions from failed server
const affectedSessions: string[] = [];
for (const [sessionId, assignedServer] of this.sessionToServer.entries()) {
if (assignedServer === server) {
affectedSessions.push(sessionId);
}
}
// Reassign to healthy servers
for (const sessionId of affectedSessions) {
this.sessionToServer.delete(sessionId);
// Clients will reconnect with new session
}
}
}
// WebSocket example
import { Server } from 'ws';
import express from 'express';
const app = express();
const lb = new SessionAffinityLoadBalancer([
'ws://server1:8080',
'ws://server2:8080',
'ws://server3:8080',
]);
app.get('/connect', (req, res) => {
// User connects for WebSocket
const sessionId = req.sessionID; // From session middleware
const wsServer = lb.selectServer(sessionId);
res.json({ wsUrl: wsServer, sessionId });
});
// Cookie with routing hint
res.setHeader('Set-Cookie', `X-Server=${wsServer}; Path=/`);
// Nginx example (session affinity via cookie)
// upstream backend {
// least_conn;
// server api1.example.com weight=1;
// server api2.example.com weight=1;
// }
//
// server {
// location / {
// proxy_pass http://backend;
// proxy_cookie_path / "/; Path=/; HTTPOnly";
// }
// }
Health Checks and Load Balancer Integration
Prevent routing to failed servers:
interface HealthCheckConfig {
interval: number; // milliseconds
timeout: number;
healthyThreshold: number; // consecutive successes before healthy
unhealthyThreshold: number; // consecutive failures before unhealthy
path: string;
expectedStatus: number;
}
class HealthCheckManager {
private servers: Map<string, { healthy: boolean; failureCount: number }>;
private config: HealthCheckConfig;
constructor(servers: string[], config: HealthCheckConfig) {
this.config = config;
this.servers = new Map(servers.map(s => [s, { healthy: true, failureCount: 0 }]));
this.startHealthChecks();
}
private startHealthChecks(): void {
setInterval(() => this.checkAllServers(), this.config.interval);
}
private async checkAllServers(): Promise<void> {
for (const [server, status] of this.servers.entries()) {
const healthy = await this.checkServer(server);
if (healthy) {
status.failureCount = 0;
if (!status.healthy) {
status.healthy = true;
console.log(`Server ${server} recovered`);
}
} else {
status.failureCount++;
if (status.failureCount >= this.config.unhealthyThreshold && status.healthy) {
status.healthy = false;
console.warn(`Server ${server} marked unhealthy (${status.failureCount} failures)`);
}
}
}
}
private async checkServer(server: string): Promise<boolean> {
try {
const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), this.config.timeout);
const response = await fetch(`http://${server}${this.config.path}`, {
signal: controller.signal,
});
clearTimeout(timeout);
return response.status === this.config.expectedStatus;
} catch (error) {
console.error(`Health check failed for ${server}:`, error);
return false;
}
}
getHealthyServers(): string[] {
return Array.from(this.servers.entries())
.filter(([_, status]) => status.healthy)
.map(([server]) => server);
}
getUnhealthyServers(): string[] {
return Array.from(this.servers.entries())
.filter(([_, status]) => !status.healthy)
.map(([server]) => server);
}
}
// Usage
const healthManager = new HealthCheckManager(
['api1.example.com', 'api2.example.com', 'api3.example.com'],
{
interval: 10000,
timeout: 5000,
healthyThreshold: 2,
unhealthyThreshold: 3,
path: '/health',
expectedStatus: 200,
}
);
// Load balancer uses only healthy servers
function selectServer(): string {
const healthyServers = healthManager.getHealthyServers();
if (healthyServers.length === 0) {
throw new Error('No healthy servers available');
}
return healthyServers[Math.floor(Math.random() * healthyServers.length)];
}
L4 vs L7 Load Balancing
Different layers provide different benefits:
// L4 (Transport layer): IP + Port based routing
// - Fast (hardware accelerated)
// - Limited routing rules (can't see HTTP headers)
// - Good for: High throughput, simple routing
// L7 (Application layer): HTTP-based routing
// - Slower (software-based)
// - Rich routing (path, headers, host)
// - Good for: Complex rules, session awareness
// L7 example: Route based on path
class HTTPLoadBalancer {
private routes = new Map<string, string[]>();
addRoute(pathPattern: string, servers: string[]): void {
this.routes.set(pathPattern, servers);
}
selectServer(path: string, method: string, headers: Record<string, string>): string {
// Route /api/* to api-pool
if (path.startsWith('/api')) {
const apiServers = this.routes.get('/api/*') || [];
return apiServers[Math.floor(Math.random() * apiServers.length)];
}
// Route /static/* to CDN
if (path.startsWith('/static')) {
return 'cdn.example.com'; // Actual CDN, not a server
}
// Route WebSocket to dedicated servers
if (headers.upgrade === 'websocket') {
const wsServers = this.routes.get('/ws') || [];
return wsServers[Math.floor(Math.random() * wsServers.length)];
}
// Default routing
return this.routes.get('default')?.[0] || 'default.example.com';
}
}
Checklist
- Choose load balancing algorithm based on workload (round-robin → least-conn → LRT)
- Implement health checks to remove failed servers
- Use consistent hashing for cache affinity
- Enable session affinity for stateful connections
- Monitor load balancer metrics (error rate, latency)
- Test failover procedures (server down, slow response)
- Use L7 load balancing for path-based routing
- Avoid single point of failure (replicate load balancer)
- Implement circuit breaker to prevent cascading failures
- Review algorithm choice quarterly (scale changes workload patterns)
Conclusion
Load balancers are critical infrastructure, but the algorithm matters as much as the hardware. Round-robin works only when servers are identical and requests are homogeneous—rare in production. Least-connections respects real workloads, response-time algorithms adapt to dynamic conditions, and consistent hashing preserves cache locality. Choosing the right algorithm multiplies application capacity without adding servers.