- Published on
eBPF for Production Observability — Zero-Instrumentation Performance Profiling
- Authors

- Name
- Sanjeev Sharma
- @webcoderspeed1
Introduction
eBPF (extended Berkeley Packet Filter) lets you observe production systems without adding instrumentation, recompiling code, or restarting processes. Unlike traditional APM which adds 5-15% overhead, eBPF captures performance data from the kernel itself. This post demystifies eBPF's architecture, shows working examples with Cilium and Parca, and explains when eBPF observability beats expensive application-level instrumentation.
- What eBPF Is and Isn't
- Cilium for Network Observability
- Parca for Continuous Profiling
- BCC Tools (execsnoop, tcptracer)
- eBPF vs Traditional APM
- Flame Graph Interpretation
- Production Safety Considerations
- eBPF Observability Checklist
- Conclusion
What eBPF Is and Isn't
eBPF is a safe, in-kernel virtual machine. Think of it as a controlled way to run code inside the Linux kernel without full privileged access:
What eBPF is:
- Event-driven kernel programming (when syscall happens, run program)
- Non-blocking performance profiling (captures data without stopping execution)
- Network packet introspection (without userspace context switch)
- Safe (kernel verifier prevents infinite loops, invalid memory access)
What eBPF isn't:
- A replacement for application metrics
- A way to modify production code without restarts
- Platform-independent (Linux 5.0+ required, specific architecture requirements)
- A magic performance cure (adds ~5% overhead vs APM's 5-15%)
eBPF architecture visualization:
┌─────────────────────────────────────────┐
│ Userspace Applications │
│ (Node.js, Python, Java) │
└────────────────────────────────────────┐│
││
┌────────────────────────────────────────┘│
│ Kernel (Linux 5.0+) │
│ │
│ ┌──────────────────────────────────┐ │
│ │ eBPF Programs (Kernel VM) │ │
│ │ - Tracepoints │ │
│ │ - kprobes/uprobes │ │
│ │ - Network hooks │ │
│ └──────────────────────────────────┘ │
│ ▲ │
│ │ │
│ ┌──────────────┴──────────────────┐ │
│ │ eBPF Events → Ring Buffer │ │
│ │ (Memory-efficient storage) │ │
│ └──────────────────────────────────┘ │
└─────────────────────────────────────────┘
│
│ (mmap'd memory)
▼
┌──────────────────────────────┐
│ Userspace Tools │
│ - bpftool │
│ - tcpdump │
│ - Parca / Cilium │
└──────────────────────────────┘
Cilium for Network Observability
Cilium uses eBPF to observe network traffic without sidecar proxies:
Installation in Kubernetes:
# Install Cilium with Hubble (observability layer)
helm repo add cilium https://helm.cilium.io
helm repo update
helm install cilium cilium/cilium \
--namespace cilium \
--create-namespace \
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true \
--set prometheus.enabled=true
Hubble CLI for network flows:
# Install Hubble CLI
curl -L --remote-name-all https://github.com/cilium/hubble/releases/download/v0.13.0/{linux-amd64.tar.gz,linux-amd64.tar.gz.sha256sum}
shasum -c linux-amd64.tar.gz.sha256sum
tar xzvfC linux-amd64.tar.gz /usr/local/bin
# Port forward to Hubble Relay
kubectl port-forward -n cilium svc/hubble-relay 4245:4245
# Observe network flows in real-time
hubble observe --pod-selector k8s:app=frontend
# Output example:
# TIMESTAMP │ SOURCE │ DESTINATION │ VERDICT │ BYTES │ PACKETS
# 2026-03-15T10:02:01Z │ frontend:8000 (frontend)│ api:3000 (api) │ ALLOWED │ 1024 │ 4
# 2026-03-15T10:02:02Z │ api:3000 (api) │ postgres:5432 (postgres)│ ALLOWED │ 512 │ 2
Cilium Network Policy with observability:
# network-policy.yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: api-ingress
namespace: production
spec:
endpointSelector:
matchLabels:
app: api
# Allow only from frontend
ingress:
- fromEndpoints:
- matchLabels:
app: frontend
toPorts:
- ports:
- port: "3000"
protocol: TCP
# Log denied packets (eBPF-based)
denyLog: true
Monitor traffic patterns:
# High-level flow aggregation
hubble observe --last 1000 \
--output json | \
jq -r '.flow | [.source.pod_name, .destination.pod_name] | @csv' | \
sort | uniq -c | sort -rn
# Identify top talkers
hubble observe --last 5000 \
--flow-direction both \
| grep "ALLOWED" \
| awk '{print $3}' \
| cut -d: -f1 \
| sort | uniq -c | sort -rn
# Find dropped packets
hubble observe \
--verdict DROPPED \
--last 1000 \
--output json | jq '.flow | {src: .source, dst: .destination, reason: .verdict_reason}'
Parca for Continuous Profiling
Parca captures CPU and memory profiles using eBPF without code instrumentation:
Kubernetes deployment:
# parca-deployment.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: parca
namespace: observability
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: parca
rules:
- apiGroups: [""]
resources: ["nodes", "pods"]
verbs: ["get", "list", "watch"]
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: parca-agent
namespace: observability
spec:
selector:
matchLabels:
app: parca-agent
template:
metadata:
labels:
app: parca-agent
spec:
hostNetwork: true
hostPID: true
hostIPC: true
serviceAccountName: parca
containers:
- name: parca-agent
image: ghcr.io/parca-dev/parca-agent:latest
securityContext:
privileged: true
env:
- name: PARCA_AGENT_STORE_ADDRESS
value: parca-server.observability:7070
- name: PARCA_AGENT_NODE
valueFrom:
fieldRef:
fieldPath: spec.nodeName
volumeMounts:
- name: debugfs
mountPath: /sys/kernel/debug
- name: procfs
mountPath: /host/proc
volumes:
- name: debugfs
hostPath:
path: /sys/kernel/debug
- name: procfs
hostPath:
path: /proc
Analyze profiles via PromQL-like syntax:
# Port forward to Parca UI
kubectl port-forward -n observability svc/parca-server 7070:7070
# Access at http://localhost:7070
# Parca UI allows filtering by:
# - Service name
# - CPU vs Memory
# - Time range
# - Pod labels
Sample profile query:
parca_samples_total{service="api", instance=~"api-.*", profile_type="cpu"}
BCC Tools (execsnoop, tcptracer)
BCC (eBPF Compiler Collection) provides pre-built observability tools:
Installation:
# Ubuntu/Debian
apt-get install -y bpftrace linux-headers-$(uname -r)
# CentOS/RHEL
yum install -y bpftrace kernel-devel
execsnoop: Trace process execution:
# See every process spawned on the system
sudo execsnoop
# Example output:
# PCOMM PID PPID RET ARGS
# bash 12345 12344 0 /bin/sh -c npm start
# node 12346 12345 0 node /app/index.js
# npm 12347 12346 0 npm run build
# webpack 12348 12347 0 webpack --mode production
# With duration tracking
sudo execsnoop -T
# Filter by process name
sudo execsnoop -c "node"
tcptracer: Monitor TCP connections:
# Track all TCP connections
sudo tcptracer
# Example output:
# COMM PID IP SADDR DADDR SPORT DPORT TX_KB RX_KB
# node 1234 4 192.168.1.10 10.0.0.5 54321 5432 104 512
# postgres 5678 4 10.0.0.5 192.168.1.10 5432 54321 512 104
# curl 9012 4 192.168.1.10 93.184.216.34 54322 443 2 8
# Monitor specific port
sudo tcptracer -d 10 --lport 3000
# Capture retransmissions
sudo tcpretrans
Write custom eBPF programs (bpftrace):
# Monitor slow HTTP requests (latency > 100ms)
sudo bpftrace -e '
tracepoint:syscalls:sys_enter_connect
{
@start[tid] = nsecs;
}
tracepoint:syscalls:sys_exit_connect
/@start[tid]/
{
$duration_ms = (nsecs - @start[tid]) / 1000000;
if ($duration_ms > 100) {
printf("Slow connection: %d ms\n", $duration_ms);
@slow_connects++;
}
delete @start[tid];
}
END
{
print(@slow_connects);
}
'
# Monitor file system reads > 1MB
sudo bpftrace -e '
tracepoint:syscalls:sys_exit_read
{
@reads[comm] = sum(args->ret);
}
END
{
print(@reads);
}
'
eBPF vs Traditional APM
Comparison matrix:
Metric | eBPF | Traditional APM
────────────────────┼───────────────┼──────────────────
Overhead | ~5% | 5-15%
Code changes | 0 | Required
Startup time | Instant | 100ms+
Language support | All (kernel) | Language-specific
Network coverage | Excellent | Partial
Security context | Full kernel | App sandbox
Cost at scale | Linear | Exponential
────────────────────┴───────────────┴──────────────────
When to use eBPF:
- Profiling without code instrumentation
- Network observability (packet-level)
- System-wide performance analysis
- Compliance monitoring
When to use traditional APM:
- Business transaction tracing
- Error tracking with source context
- Custom business metrics
- Language-specific frameworks (Django, Rails, etc.)
Hybrid approach (production-recommended):
# Deployment observability stack
observability:
apm: datadog
- application tracing
- error tracking
- custom metrics
ebpf:
- cilium (network flows)
- parca (continuous profiling)
- kernel metrics
metrics:
- prometheus (infrastructure)
- app instrumentation (custom)
Flame Graph Interpretation
Flame graphs visualize where CPU time is spent:
┌──────────────────────────────────────────────────────────────┐
│ CPU Flame Graph │
├──────────────────────────────────────────────────────────────┤
│ │
│ main (100% total time) │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ process_request (60% — wide, CPU-heavy) │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ parse_json │ │ validate │ │ database │ │ │
│ │ │ (10%) │ │ (15%) │ │ query (35%) │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ io_wait (40% — context switches, not CPU) │ │
│ │ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ file read │ │ network wait │ │ │
│ │ │ (20%) │ │ (20%) │ │ │
│ │ └──────────────┘ └──────────────┘ │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────┘
Reading:
- Width = CPU time
- Height = call stack depth
- Color = function or call path
- Wider = more time spent
Optimization targets (work top-down):
1. database_query (35%) — Add indexes, optimize query
2. io_wait (20% each) — Parallelize, batch operations
3. validate (15%) — Move to client-side if possible
Generate flame graphs from Parca:
# Export profile as FlameGraph-compatible format
curl -s 'http://localhost:7070/api/v1/query' \
-d '{"query":"parca_samples_total","time_range_from":"1h"}' | \
jq '.profile' > profile.pb
# Convert to flame graph SVG
parca_cli export-profile \
--service api \
--profile-type cpu \
--output flame-graph.svg
Production Safety Considerations
eBPF runs in the kernel. Mistakes can crash systems:
Safety rules:
# 1. Kernel verifier catches many issues
# ✗ Cannot write arbitrary memory
# ✗ Cannot create infinite loops
# ✗ Cannot make unverified function calls
# ✓ These checks are mandatory
# 2. Use established tools (Cilium, Parca) not custom eBPF
# Risk: Low (heavily tested)
# Custom eBPF: Medium (requires kernel knowledge)
# 3. Test in staging first
sudo bpftrace -e 'profile:hz:100 { @[kstack] = count(); }' &
# Run workload, then Ctrl+C
# Verify it works safely
# 4. Monitor eBPF memory usage
bpftool prog show
bpftool map show
# Example output:
# ID NAME TYPE LOADED STATX
# 1 map_1 HASH true true
# 2 ringbuf RINGBUF true true
#
# Memory used: ~2MB total (safe)
eBPF Deployment Checklist:
# safety/ebpf-readiness.yaml
prerequisites:
- "✓ Linux kernel 5.10+ (check: uname -r)"
- "✓ BPF LSM enabled (check: cat /boot/config | grep CONFIG_BPF)"
- "✓ Debugfs mounted (check: mount | grep debugfs)"
tooling:
- "✓ bpftool installed"
- "✓ bpftrace version matches kernel"
- "✓ Cilium/Parca containers have CAP_SYS_RESOURCE"
monitoring:
- "✓ Alert on eBPF program load failures"
- "✓ Monitor ring buffer fullness"
- "✓ Track kernel memory usage"
operations:
- "✓ Runbook for eBPF tool unload"
- "✓ Graceful shutdown documented"
- "✓ Revert procedure tested"
eBPF Observability Checklist
# Deployment strategy
phase_1_network:
- Deploy Cilium to cluster
- Enable Hubble for network flows
- Monitor for 1 week
- Alert on policy violations
phase_2_profiling:
- Deploy Parca agent as DaemonSet
- Collect baselines for normal workloads
- Identify hot paths
- Correlate with metrics
phase_3_custom_tools:
- Test bpftrace tools in staging
- Deploy tcptracer for connection tracking
- Create runbook for common issues
validation:
- "✓ No kernel panics or warnings"
- "✓ < 5% performance overhead"
- "✓ Memory usage stable over 48 hours"
- "✓ Integration with alerting/dashboards"
Conclusion
eBPF represents a paradigm shift in production observability. Unlike traditional APM which adds overhead and requires code changes, eBPF observes the kernel's view of system behavior directly. Cilium provides network visibility without proxy overhead. Parca captures continuous profiles without touching your code. BCC tools offer ad-hoc investigation capabilities. Combined with traditional APM, eBPF completes the observability picture: applications handle business logic tracing; eBPF handles system-level profiling and network analysis. Start with Cilium for network visibility, add Parca for profiling, then integrate BCC tools for specific investigations. The kernel's eye on your system is invaluable for understanding production behavior.