- Published on
KEDA — Event-Driven Autoscaling for Kubernetes Workloads
- Authors

- Name
- Sanjeev Sharma
- @webcoderspeed1
Introduction
Horizontal Pod Autoscaler (HPA) scales based on CPU and memory. But many workloads don't correlate with resource utilization—they correlate with queue depth, message lag, time of day, or custom business metrics. KEDA (Kubernetes Event Driven Autoscaling) fills this gap, connecting Kubernetes to external event sources and scaling workloads based on their actual demand. This post covers KEDA architecture, common scalers (SQS, Kafka, cron), cold starts, combining KEDA with HPA, and production tuning.
- KEDA Architecture
- SQS Queue Depth Scaler
- Kafka Consumer Lag Scaler
- Cron-Based Scaler
- Scaling to Zero and Cold Start Implications
- Combining KEDA with HPA
- Prometheus Custom Metrics Scaler
- Production Tuning
- Checklist
- Conclusion
KEDA Architecture
KEDA consists of two core resources: ScaledObject and ScaledJob. ScaledObject wraps a Deployment and scales replicas. ScaledJob manages Jobs that run to completion.
Architecture diagram (conceptual):
- KEDA Operator watches ScaledObjects and ScaledJobs
- Periodically queries scalers (SQS, Kafka, custom endpoints)
- Updates HPA (or scales directly) based on metric values
- Scalers execute on KEDA's schedule (polling interval)
ScaledObject for continuous workloads:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: job-processor
namespace: processing
spec:
scaleTargetRef:
name: job-processor-deployment
kind: Deployment
minReplicaCount: 2
maxReplicaCount: 100
pollingInterval: 30
cooldownPeriod: 300
triggers:
- type: aws-sqs-queue
metadata:
queueURL: "https://sqs.us-east-1.amazonaws.com/123456789/processing"
queueLength: "10"
awsRegion: "us-east-1"
identityOwner: "operator"
authenticationRef:
name: aws-credentials
advanced:
horizontalPodAutoscalerConfig:
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
- type: Pods
value: 4
periodSeconds: 15
selectPolicy: Max
fallbacks:
- failureType: "all"
replicas: 5
ScaledJob for batch workloads:
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
name: batch-processor
namespace: batch
spec:
jobTargetRef:
template:
spec:
containers:
- name: processor
image: batch-processor:v1.2.3
env:
- name: BATCH_SIZE
value: "100"
- name: TIMEOUT_SECONDS
value: "1800"
resources:
requests:
cpu: "1"
memory: "512Mi"
limits:
cpu: "2"
memory: "1Gi"
restartPolicy: Never
backoffLimit: 3
minReplicaCount: 0
maxReplicaCount: 50
pollingInterval: 30
successfulJobsHistoryLimit: 5
failedJobsHistoryLimit: 5
triggers:
- type: aws-sqs-queue
metadata:
queueURL: "https://sqs.us-east-1.amazonaws.com/123456789/batch-jobs"
queueLength: "20"
awsRegion: "us-east-1"
identityOwner: "operator"
SQS Queue Depth Scaler
The most common use case: scale workers based on queue depth.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: sqs-scaler
namespace: production
spec:
scaleTargetRef:
name: queue-worker
kind: Deployment
minReplicaCount: 3
maxReplicaCount: 200
pollingInterval: 30
cooldownPeriod: 300
triggers:
- type: aws-sqs-queue
metadata:
queueURL: "https://sqs.us-east-1.amazonaws.com/123456789/tasks"
queueLength: "5"
awsRegion: "us-east-1"
identityOwner: "operator"
authenticationRef:
name: keda-aws-credentials
How it works:
- KEDA queries SQS:
GetQueueAttributes(ApproximateNumberOfMessages) - If messages >= 5 per pod, scale up
- If queue is empty, scale down to minReplicaCount
The queueLength parameter is critical. If set to 5 and queue has 50 messages, KEDA scales to 10 replicas (50/5).
Kafka Consumer Lag Scaler
For event streaming workloads, scale based on consumer group lag.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: kafka-consumer
namespace: streaming
spec:
scaleTargetRef:
name: kafka-processor
kind: Deployment
minReplicaCount: 2
maxReplicaCount: 50
pollingInterval: 30
triggers:
- type: kafka
metadata:
bootstrapServers: kafka-broker-1:9092,kafka-broker-2:9092,kafka-broker-3:9092
consumerGroup: "order-processor"
topic: "orders"
lagThreshold: "100"
offsetResetPolicy: "latest"
authenticationRef:
name: kafka-auth
Example Kafka pod:
apiVersion: v1
kind: Pod
metadata:
name: kafka-processor
namespace: streaming
spec:
containers:
- name: processor
image: kafka-processor:v1.2.3
env:
- name: KAFKA_BROKERS
value: "kafka-broker-1:9092,kafka-broker-2:9092"
- name: KAFKA_CONSUMER_GROUP
value: "order-processor"
- name: KAFKA_TOPICS
value: "orders"
- name: BATCH_SIZE
value: "100"
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "2"
memory: "2Gi"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
KEDA calculates lag = (latest offset - committed offset) and scales to keep lag manageable.
Cron-Based Scaler
Scale workloads on a schedule (e.g., scale up before business hours).
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: scheduled-scaler
namespace: production
spec:
scaleTargetRef:
name: api-server
kind: Deployment
minReplicaCount: 2
maxReplicaCount: 50
triggers:
# Business hours: 9 AM - 6 PM weekdays
- type: cron
metadata:
timezone: America/New_York
start: "0 9 * * 1-5"
end: "0 18 * * 1-5"
desiredReplicas: "20"
# Night and weekends
- type: cron
metadata:
timezone: America/New_York
start: "0 18 * * 1-5"
end: "0 9 * * 1-5"
desiredReplicas: "5"
# Weekend
- type: cron
metadata:
timezone: America/New_York
start: "0 0 * * 0,6"
end: "0 23 * * 0,6"
desiredReplicas: "3"
At 9 AM ET weekdays, KEDA scales to 20 replicas. At 6 PM, it scales to 5. This saves cost during off-peak hours while maintaining responsiveness during business hours.
Scaling to Zero and Cold Start Implications
One of KEDA's powerful features: scale to zero. But cold starts have latency implications.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: eventless-scaler
namespace: production
spec:
scaleTargetRef:
name: api-server
kind: Deployment
minReplicaCount: 0 # Scale to zero
maxReplicaCount: 50
triggers:
- type: aws-sqs-queue
metadata:
queueURL: "..."
queueLength: "1"
awsRegion: "us-east-1"
When queue is empty, minReplicaCount: 0 scales all pods down. First message arrives, KEDA detects it, scales up a pod. But pod startup takes 10-30 seconds.
Mitigate cold starts:
Option 1: Keep warm pods:
minReplicaCount: 2 # Always keep 2 pods warm
Option 2: Optimize startup time:
apiVersion: v1
kind: Pod
metadata:
name: api-server
spec:
terminationGracePeriodSeconds: 30
containers:
- name: api
image: api:v1
startupProbe:
httpGet:
path: /startup
port: 8080
failureThreshold: 30
periodSeconds: 2
readinessProbe:
httpGet:
path: /ready
port: 8080
periodSeconds: 5
The startupProbe allows up to 60 seconds for the container to be ready. Optimize app initialization (lazy loading, async setup).
Combining KEDA with HPA
HPA can work alongside KEDA. HPA scales on CPU/memory; KEDA scales on events.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: dual-scaler
namespace: production
spec:
scaleTargetRef:
name: api-server
kind: Deployment
minReplicaCount: 3
maxReplicaCount: 100
triggers:
- type: aws-sqs-queue
metadata:
queueURL: "..."
queueLength: "5"
awsRegion: "us-east-1"
- type: cpu
metricType: Utilization
metadata:
value: "70"
KEDA creates an HPA internally. Both triggers (SQS queue and CPU) scale independently. The maximum replica count from any trigger wins.
Prometheus Custom Metrics Scaler
Scale based on custom Prometheus metrics.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: prometheus-scaler
namespace: production
spec:
scaleTargetRef:
name: api-server
kind: Deployment
minReplicaCount: 2
maxReplicaCount: 50
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus:9090
metricName: custom_metric
query: |
sum(rate(api_requests_total[5m]))
threshold: "1000"
authenticationRef:
name: prometheus-auth
Query any metric from Prometheus. If sum(rate(api_requests_total[5m])) > 1000, scale up. This enables sophisticated scaling based on business metrics (revenue/sec, orders/sec, etc.).
Production Tuning
pollingInterval: How often KEDA queries scalers. Lower values = faster response to load changes, higher CPU. Default: 30 seconds. For SQS: 30 seconds is reasonable.
cooldownPeriod: How long to wait after scale-down before trying again. Prevents flapping. Default: 300 seconds (5 minutes).
fallbacks: Fallback replica count if scaler fails.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: reliable-scaler
spec:
scaleTargetRef:
name: api-server
kind: Deployment
minReplicaCount: 2
maxReplicaCount: 50
pollingInterval: 15 # Faster response
cooldownPeriod: 120 # Shorter cooldown for responsive scaling
triggers:
- type: aws-sqs-queue
metadata:
queueURL: "..."
queueLength: "10"
fallbacks:
- failureType: "all"
replicas: 5 # If scaler fails, maintain 5 replicas
Advanced HPA behavior (scale-up speed, scale-down speed):
advanced:
horizontalPodAutoscalerConfig:
behavior:
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 200 # Double replicas per period
periodSeconds: 15
- type: Pods
value: 10
periodSeconds: 15
selectPolicy: Max # Pick the policy that scales fastest
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5 min before scaling down
policies:
- type: Percent
value: 50 # Cut replicas in half
periodSeconds: 60
This scales up aggressively (double per 15 seconds) but conservatively scales down (half per 60 seconds, with 5-minute stabilization). This prevents thrashing.
Checklist
- KEDA operator installed and running
- ScaledObjects/ScaledJobs configured for all event-driven workloads
- SQS queue length tuned (test with known queue sizes)
- Kafka consumer lag scaler set with appropriate thresholds
- Cron scalers configured for predictable load patterns
- minReplicaCount balances cost vs cold start latency
- Fallback replicas set to handle scaler failures gracefully
- pollingInterval and cooldownPeriod tuned for stability
- HPA behavior (scale-up/down policies) prevents flapping
- Monitoring alerts on KEDA scaling decisions and failures
- Cold start latency measured and acceptable for SLAs
- Runbooks document how to manually override scaling
Conclusion
KEDA transforms event-driven workloads from static overprovisioning to dynamic, demand-responsive scaling. Queue-based scalers eliminate the guessing game of capacity planning. Combining KEDA with HPA ensures workloads scale on both event demand and resource utilization. Tune pollingInterval, cooldownPeriod, and HPA behavior carefully to balance responsiveness and stability. With KEDA, your clusters become leaner and more cost-efficient while maintaining SLAs.