Published on

Clock Skew Breaking Tokens — When Servers Disagree on What Time It Is

Authors

Introduction

Your load balancer distributes auth token validation across 5 servers. Server 3 has drifted 3 minutes ahead. JWTs issued by other servers look "issued in the future" to Server 3 — rejected as invalid. Users get intermittent 401s.

Clock skew is a fundamental challenge in distributed systems — and it's completely invisible until it breaks something.

The Clock Skew Problem

JWT issued by Server A at    14:00:00 (UTC)
Validated by Server B at     13:57:30 (UTC2.5 min behind)

Server B sees:
  iat (issued at):  14:00:00
  Current time:     13:57:30

  Token appears to have been issued 2.5 minutes IN THE FUTURE
Invalid! Rejected!

Result: user can't log in, intermittently
Security scenario (clock ahead):
  Token expires at: 14:05:00
  Server C time:    13:58:00 (7 min behind)
  Server C thinks: token expires in 7 minutes (not 1)
Expired token accepted for extra 6 minutes

Fix 1: NTP — Keep All Clocks in Sync

# The infrastructure fix: ensure all servers sync time via NTP

# Ubuntu/Debian
sudo apt install chrony
sudo systemctl enable chrony
sudo systemctl start chrony

# Check sync status
chronyc tracking
# Should show offset < 100ms

# AWS instances: Amazon Time Sync Service
# Automatically configured — but verify
chronyc sources -v | grep 169.254.169.123  # AWS time server

# Kubernetes: nodes inherit from host — ensure host has NTP
# Check clock offset across nodes:
kubectl get nodes -o wide
for node in $(kubectl get nodes -o name); do
  kubectl debug node/$node -it --image=busybox -- date
done

Fix 2: Clock Skew Tolerance in JWT Validation

import jwt from 'jsonwebtoken'

// ❌ Zero tolerance — rejects tokens if clocks differ even slightly
const payload = jwt.verify(token, secret)

// ✅ Allow clock skew tolerance
const payload = jwt.verify(token, secret, {
  clockTolerance: 30,  // 30 seconds tolerance for clock skew
  // Token valid even if issued up to 30s "in the future" from this server's perspective
})
// Manual JWT validation with explicit skew handling
import { jwtVerify } from 'jose'

async function verifyTokenWithSkewTolerance(token: string, secret: Uint8Array) {
  const { payload } = await jwtVerify(token, secret, {
    clockTolerance: '30s',  // jose library format
  })
  return payload
}

// For critical security contexts: use shorter tolerance
// For regular APIs: 30-60 seconds is safe
// Clock skew > 1 minute means your NTP is broken — fix the root cause

Fix 3: Don't Rely Solely on exp for Token Expiry

// Defense in depth: server-side token invalidation

// Store issued tokens in Redis with their expiry
// Allows instant revocation without clock dependency

class TokenService {
  constructor(private redis: Redis) {}

  async issue(userId: string, expiresInSeconds: number): Promise<string> {
    const jti = crypto.randomUUID()  // Unique token ID

    const token = jwt.sign(
      { sub: userId, jti },
      process.env.JWT_SECRET!,
      { expiresIn: expiresInSeconds }
    )

    // Store in Redis with same TTL
    await this.redis.setex(
      `token:${jti}`,
      expiresInSeconds + 60,  // +60s buffer for clock skew
      userId
    )

    return token
  }

  async verify(token: string): Promise<{ userId: string; jti: string }> {
    // Step 1: Cryptographic verification with skew tolerance
    const payload = jwt.verify(token, process.env.JWT_SECRET!, {
      clockTolerance: 60,  // 60 second tolerance
    }) as { sub: string; jti: string; exp: number }

    // Step 2: Check server-side validity (not revoked, not expired by server)
    const serverRecord = await this.redis.get(`token:${payload.jti}`)
    if (!serverRecord) {
      throw new TokenExpiredError('Token not found or revoked')
    }

    return { userId: payload.sub, jti: payload.jti }
  }

  async revoke(jti: string): Promise<void> {
    await this.redis.del(`token:${jti}`)
    // Token immediately invalid — no need to wait for JWT exp
  }
}

Fix 4: Relative Time Over Absolute Time

// When clock sync isn't reliable, use relative durations instead of absolute timestamps

// ❌ Absolute expiry timestamp — affected by clock skew
const token = jwt.sign(
  { sub: userId, exp: Math.floor(Date.now() / 1000) + 3600 },
  secret
)

// ✅ Relative duration — less sensitive to clock differences
const token = jwt.sign(
  { sub: userId },
  secret,
  { expiresIn: '1h' }  // 1 hour from issue time — relative
)

// For OAuth: use `expires_in` (seconds) not `expires_at` (timestamp)
// Client calculates expiry: Date.now() + (expires_in * 1000)
// No cross-server clock dependency

Fix 5: Detect Clock Skew in Production

// Monitor clock sync — alert before it breaks auth

// Response header: include server time in all responses
app.use((req, res, next) => {
  res.setHeader('X-Server-Time', new Date().toISOString())
  next()
})

// Client: compare server time to local time
async function detectClockSkew() {
  const clientBefore = Date.now()
  const response = await fetch('/api/health')
  const clientAfter = Date.now()

  const serverTime = new Date(response.headers.get('X-Server-Time')!).getTime()
  const clientMid = (clientBefore + clientAfter) / 2  // Rough correction for RTT
  const skewMs = serverTime - clientMid

  if (Math.abs(skewMs) > 30_000) {
    console.warn(`Clock skew detected: ${skewMs}ms`)
    // Adjust token refresh logic to account for skew
  }

  return skewMs
}

// Server-side: health endpoint exposes clock info
app.get('/health', (req, res) => {
  res.json({
    status: 'ok',
    serverTime: new Date().toISOString(),
    serverTimestamp: Date.now(),
    ntpOffset: getNTPOffset(),  // From chronyd/ntpd
  })
})

Fix 6: Distributed Tracing for Timing Issues

// Include timestamps in distributed traces to diagnose skew issues

import { trace, context } from '@opentelemetry/api'

async function validateToken(token: string) {
  const span = trace.getActiveSpan()

  try {
    const payload = await tokenService.verify(token)

    span?.setAttributes({
      'token.jti': payload.jti,
      'token.issued_at': payload.iat,
      'server.time': Date.now() / 1000,
      'clock.skew_seconds': Date.now() / 1000 - (payload.iat ?? 0),
    })

    return payload
  } catch (err) {
    span?.setAttributes({
      'token.error': err.message,
      'server.time': Date.now() / 1000,
    })
    throw err
  }
}
// Traces show exactly what the clock difference was at the time of failure

Clock Skew Checklist

  • ✅ NTP/Chrony configured on all servers
  • ✅ Monitor NTP offset — alert if > 500ms
  • ✅ JWT validation with 30-60s clock tolerance
  • ✅ Redis-backed token store for instant revocation
  • ✅ Server time in API response headers for client sync detection
  • ✅ Use expiresIn (relative) not exp (absolute) in JWT
  • ✅ Distributed tracing includes clock skew metrics

Conclusion

Clock skew in distributed systems is inevitable — NTP reduces it but doesn't eliminate it. Your JWT validation must account for this reality: configure clockTolerance of 30-60 seconds to handle normal NTP drift. For the root cause, ensure Chrony is installed and syncing on all nodes, and alert when NTP offset exceeds 500ms. For stronger security guarantees, add a Redis-backed token store that lets you revoke tokens instantly and doesn't rely solely on the JWT exp claim. A 30-second clock tolerance doesn't meaningfully reduce security — it prevents production auth failures.