- Published on
Kubernetes Resource Management — Requests, Limits, and Why Your Pods Keep Getting OOMKilled
- Authors

- Name
- Sanjeev Sharma
- @webcoderspeed1
Introduction
Resource management is one of the most misunderstood aspects of Kubernetes. You set requests and limits, assume your containers won't consume more than allocated, and then wake up at 2 AM to pages about pods being OOMKilled on nodes that appeared to have free memory. Understanding the difference between requests and limits, CPU throttling versus memory OOM, and how Kubernetes assigns Quality of Service classes is fundamental to running reliable workloads at scale.
- Requests vs Limits: The Core Distinction
- CPU Throttling vs Memory OOM
- LimitRange and Namespace Defaults
- Vertical Pod Autoscaler (VPA) Recommendation Mode
- HPA with Custom Metrics (KEDA)
- Quality of Service Classes
- ResourceQuota for Namespace Control
- Right-Sizing Workflow
- Checklist
- Conclusion
Requests vs Limits: The Core Distinction
Requests tell Kubernetes the minimum resources a pod needs. Limits act as a ceiling. The scheduler uses requests to bin-pack pods onto nodes; the kubelet uses limits to enforce resource boundaries.
apiVersion: v1
kind: Pod
metadata:
name: resource-demo
spec:
containers:
- name: app
image: my-app:v1.2.3
resources:
requests:
cpu: "500m" # Scheduler guarantees this
memory: "256Mi" # Scheduler guarantees this
limits:
cpu: "1000m" # Kubelet enforces (kernel cgroup)
memory: "512Mi" # Kubelet OOMKills if exceeded
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 15
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- my-app
topologyKey: kubernetes.io/hostname
If your pod requests 256Mi but tries to allocate 513Mi, the kubelet sends SIGKILL. If a pod requests 500m CPU but needs 800m, the kernel throttles it—your app slows down but doesn't crash.
CPU Throttling vs Memory OOM
CPU is a compressible resource; memory is not. When a pod exceeds its CPU limit, the kernel's CFS scheduler throttles the process. Latency increases, throughput drops, but the pod stays running.
---
apiVersion: v1
kind: Pod
metadata:
name: throttled-cpu
spec:
containers:
- name: worker
image: busybox:1.35
resources:
requests:
cpu: "100m"
limits:
cpu: "200m"
command:
- /bin/sh
- -c
- |
while true; do
dd if=/dev/zero of=/dev/null bs=1M count=1024
done
restartPolicy: Never
This pod will be CPU-throttled but survive. Memory violations, by contrast, are fatal. The kernel's OOM killer selects processes (often the offending pod) and terminates them.
---
apiVersion: v1
kind: Pod
metadata:
name: oom-demo
spec:
containers:
- name: memory-hog
image: ubuntu:22.04
resources:
requests:
memory: "100Mi"
limits:
memory: "100Mi"
command:
- /bin/bash
- -c
- |
# This will OOMKill the pod
python3 -c "
import os
data = []
while True:
data.append('x' * (1024*1024))
"
LimitRange and Namespace Defaults
Rather than rely on developers to always set requests and limits, use LimitRange to enforce defaults per namespace.
apiVersion: v1
kind: LimitRange
metadata:
name: app-limits
namespace: production
spec:
limits:
- type: Container
default:
cpu: "500m"
memory: "512Mi"
defaultRequest:
cpu: "100m"
memory: "128Mi"
max:
cpu: "4"
memory: "4Gi"
min:
cpu: "10m"
memory: "32Mi"
- type: Pod
max:
cpu: "8"
memory: "8Gi"
min:
cpu: "20m"
memory: "64Mi"
Containers created in the production namespace without explicit resources inherit these defaults. The max field prevents over-allocation.
Vertical Pod Autoscaler (VPA) Recommendation Mode
VPA analyzes actual resource usage and recommends appropriate requests and limits. In recommendation mode, it doesn't auto-update pods—you review and apply recommendations.
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: app-vpa
namespace: production
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: api-server
updatePolicy:
updateMode: "off" # Recommendation mode
resourcePolicy:
containerPolicies:
- containerName: api
minAllowed:
cpu: "50m"
memory: "64Mi"
maxAllowed:
cpu: "2"
memory: "2Gi"
controlledResources:
- cpu
- memory
controlledValues: RequestsAndLimits
VPA examines metrics over days and suggests values. Extract recommendations with:
kubectl describe vpa app-vpa -n production
HPA with Custom Metrics (KEDA)
Horizontal Pod Autoscaler (HPA) scales replicas based on metrics. For simple CPU/memory, use Kubernetes HPA. For SQS queue depth, Kafka lag, or custom metrics, use KEDA.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: sqs-scaler
namespace: processing
spec:
scaleTargetRef:
name: job-processor
kind: Deployment
minReplicaCount: 1
maxReplicaCount: 50
triggers:
- type: aws-sqs-queue
metadata:
queueURL: "https://sqs.us-east-1.amazonaws.com/123456789/jobs"
queueLength: "5"
awsRegion: "us-east-1"
identityOwner: "operator"
fallbacks:
- failureType: "all"
replicas: 10
This scales the Deployment from 1 to 50 replicas based on SQS queue depth, maintaining 5 messages per pod.
Quality of Service Classes
Kubernetes assigns QoS classes based on requests and limits. Understanding these classes is critical for reliability.
Guaranteed: Pod has matching requests and limits for CPU and memory. Under eviction, these pods are killed last.
spec:
containers:
- name: app
resources:
requests:
cpu: "500m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "256Mi"
Burstable: Pod has requests less than limits, or has some but not all resources specified. Killed before Guaranteed, after BestEffort.
spec:
containers:
- name: app
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "512Mi"
BestEffort: No requests or limits. Killed first under memory pressure.
ResourceQuota for Namespace Control
ResourceQuota enforces aggregate resource consumption per namespace, preventing one team from monopolizing the cluster.
apiVersion: v1
kind: ResourceQuota
metadata:
name: team-quota
namespace: team-a
spec:
hard:
requests.cpu: "20"
requests.memory: "40Gi"
limits.cpu: "40"
limits.memory: "80Gi"
pods: "100"
services: "10"
persistentvolumeclaims: "5"
scopeSelector:
matchExpressions:
- operator: In
scopeName: PriorityClass
values: ["high", "medium"]
Right-Sizing Workflow
- Baseline: Set conservative requests based on app documentation (e.g., JVM needs 512Mi minimum).
- Monitor: Deploy with LimitRange defaults. Collect metrics over 1-2 weeks.
- Analyze: Use VPA recommendations or Prometheus queries to identify actual usage patterns.
- Adjust: Update requests to P95 usage; set limits to P99 + 20% headroom.
- Test: Verify latency and throughput with load tests under the new limits.
- Review: Quarterly re-evaluate as traffic patterns change.
Checklist
- All production pods have explicit CPU and memory requests
- CPU limit ≥ 2x request; memory limit ≥ 1.25x request (at minimum)
- LimitRange defined for all critical namespaces
- ResourceQuota prevents single team/app from consuming entire cluster
- VPA deployed in recommendation mode; reviewed monthly
- HPA configured for stateless workloads with appropriate target utilization (70-80%)
- QoS class verified for critical pods (aim for Guaranteed)
- Monitoring alerts on CPU throttling (node_cpu_cfs_throttled_seconds_total)
- Runbooks document how to respond to OOMKills and evictions
- Annual right-sizing review based on actual usage trends
Conclusion
Kubernetes resource management is not a one-time configuration task—it's an ongoing practice of observation, measurement, and adjustment. Start with conservative requests, monitor actual usage, and use VPA to guide refinements. Combine QoS classes, LimitRange, and ResourceQuota to prevent noisy neighbors and cascading failures. In production, this discipline transforms a flaky platform into one where pods behave predictably and your on-call rotations aren't eaten alive by resource-related incidents.