- Published on
Container Security — From Dockerfile to Runtime Protection
- Authors

- Name
- Sanjeev Sharma
- @webcoderspeed1
Introduction
Containers concentrate risk. A single compromised image deployed to thousands of pods is a catastrophic failure. A root-running process with access to the entire filesystem is an attacker's dream.
Container security spans the full lifecycle: build-time (image composition), distribution (registry security), and runtime (pod policies, network rules). This guide covers production-grade practices that fit into your pipeline.
- Non-Root User in Dockerfile
- Distroless Base Images
- Multi-Stage Builds to Minimize Attack Surface
- Secrets NOT in Image Layers
- Read-Only Filesystem
- Seccomp Profiles
- Network Policies in Kubernetes
- Image Scanning With Trivy in CI
- SBOM Generation
- Container Security Checklist
- Conclusion
Non-Root User in Dockerfile
Never run containers as root. Root access is a privilege escalation risk and violates principle of least privilege.
# ❌ BAD: Runs as root
FROM node:18
WORKDIR /app
COPY . .
RUN npm install --production
EXPOSE 3000
CMD ["node", "server.js"]
# ✓ GOOD: Runs as unprivileged user
FROM node:18-alpine
# Create dedicated user (UID >1000 avoids system user conflicts)
RUN addgroup -g 1001 app && \
adduser -D -u 1001 -G app app
WORKDIR /app
# Copy with correct ownership
COPY package.json package-lock.json ./
RUN npm install --production
COPY . .
# Switch to unprivileged user
USER app
EXPOSE 3000
CMD ["node", "server.js"]
Kubernetes enforcement:
# k8s-deployment.yaml
apiVersion: v1
kind: Pod
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1001
runAsGroup: 1001
fsGroup: 1001
containers:
- name: app
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
add:
- NET_BIND_SERVICE # Only if needed
volumeMounts:
- name: tmp
mountPath: /tmp
- name: cache
mountPath: /app/.cache
volumes:
- name: tmp
emptyDir: {}
- name: cache
emptyDir: {}
Distroless Base Images
Minimize attack surface by using distroless images: only application + runtime, no package manager or shell.
# Traditional image: ~900MB, includes shell, package manager, build tools
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y nodejs npm
COPY . /app
CMD ["node", "/app/server.js"]
# Distroless image: ~150MB, only application + glibc
FROM node:18 AS builder
WORKDIR /app
COPY . .
RUN npm install --production
FROM gcr.io/distroless/nodejs18-debian11
COPY /app /app
WORKDIR /app
ENTRYPOINT ["node", "server.js"]
# Distroless Node image sizes:
# - node:18-alpine: 171MB
# - gcr.io/distroless/nodejs18: 97MB (includes Node but no shell)
# - scratch: 0MB (only application, for static binaries)
Distroless tradeoffs:
- Pro: No shell, no package manager, minimal attack surface
- Pro: Fast startup (fewer layers, smaller image)
- Con: Harder to debug inside container (no /bin/bash)
- Con: Runtime dependency mismatches harder to diagnose
Recommended: use distroless for production, alpine or ubuntu for staging.
Multi-Stage Builds to Minimize Attack Surface
Build artifacts that could contain vulnerabilities or secrets need not exist in final image.
# Multi-stage Node.js + build tools build
FROM node:18-alpine AS builder
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm install
COPY . .
# Compile TypeScript, bundle with esbuild
RUN npx tsc && npx esbuild dist/server.js --bundle --outfile=dist/app.js
# Optional: run tests
RUN npm run test
# Final stage: minimal distroless image
FROM gcr.io/distroless/nodejs18-debian11
COPY /app/dist/app.js /app/
COPY /app/node_modules /app/node_modules
WORKDIR /app
ENTRYPOINT ["node", "app.js"]
Multi-stage benefits:
- TypeScript compiler: not in production image
- Build dependencies: npm, node-gyp not included
- Test frameworks: jest, vitest removed
- Source code: .ts files excluded, only compiled .js remains
Secrets NOT in Image Layers
Secrets baked into images are permanent, immutable, and copied to every container.
# ❌ BAD: Secret in image layer (git history is permanent!)
FROM node:18
RUN npm config set //registry.npmjs.org/:_authToken=$NPM_TOKEN
COPY . .
RUN npm install
# ✓ GOOD: Use Docker BuildKit secrets
# Build command:
# DOCKER_BUILDKIT=1 docker build \
# --secret npm_token=~/.npmjs/token \
# -t myapp .
FROM node:18
RUN \
npm config set //registry.npmjs.org/:_authToken=$(cat /run/secrets/npm_token) && \
npm install
COPY . .
Runtime secrets (environment variables, mounted secrets):
# Secrets come from Kubernetes Secrets or external vault
FROM gcr.io/distroless/nodejs18
COPY . /app
WORKDIR /app
# At runtime, Kubernetes injects:
# env:
# - name: DATABASE_URL
# valueFrom:
# secretKeyRef:
# name: db-secret
# key: url
# - name: API_KEY
# valueFrom:
# secretKeyRef:
# name: api-secret
# key: key
ENTRYPOINT ["node", "server.js"]
Read-Only Filesystem
Restrict write access to only directories that must be writable.
# k8s-deployment.yaml
apiVersion: v1
kind: Pod
spec:
containers:
- name: app
securityContext:
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
volumeMounts:
# Writable temp directory for Node.js (not /tmp)
- name: tmp
mountPath: /tmp
readOnly: false
# Writable cache directory
- name: cache
mountPath: /app/.cache
readOnly: false
# Everything else is read-only
volumes:
- name: tmp
emptyDir: {}
- name: cache
emptyDir: {}
Seccomp Profiles
Restrict system calls available to containers, reducing kernel attack surface.
# k8s-deployment.yaml with seccomp
apiVersion: v1
kind: Pod
spec:
securityContext:
seccompProfile:
type: RuntimeDefault # Use container runtime's default
containers:
- name: app
image: myapp:latest
securityContext:
seccompProfile:
type: Localhost
localhostProfile: myapp-seccomp.json
---
# Custom seccomp profile
apiVersion: v1
kind: ConfigMap
metadata:
name: seccomp-profiles
data:
myapp-seccomp.json: |
{
"defaultAction": "SCMP_ACT_ERRNO",
"defaultErrnoRet": 1,
"archMap": [
{
"architecture": "SCMP_ARCH_X86_64",
"subArchitectures": ["SCMP_ARCH_X86", "SCMP_ARCH_X32"]
}
],
"syscalls": [
{
"names": [
"accept4",
"arch_prctl",
"bind",
"brk",
"clone",
"close",
"connect",
"dup",
"dup2",
"epoll_create1",
"epoll_ctl",
"epoll_wait",
"exit",
"exit_group",
"fcntl",
"fstat",
"fstatfs",
"futex",
"getcwd",
"getegid",
"getgid",
"getpeername",
"getpid",
"getrandom",
"getrlimit",
"getrusage",
"getsockname",
"getsockopt",
"gettimeofday",
"listen",
"lseek",
"madvise",
"mmap",
"mprotect",
"msan_check_mem_is_initialized",
"msan_memory_is_poisoned",
"munmap",
"open",
"openat",
"poll",
"pread64",
"prlimit64",
"pselect6",
"read",
"readlink",
"readlinkat",
"recvfrom",
"recvmsg",
"rt_sigaction",
"rt_sigprocmask",
"rt_sigreturn",
"sched_getaffinity",
"sched_yield",
"select",
"sendmsg",
"sendto",
"set_robust_list",
"set_tid_address",
"setgid",
"setgroups",
"setsockopt",
"setuid",
"sigaction",
"sigaltstack",
"sigprocmask",
"sigreturn",
"socket",
"socketpair",
"stat",
"statfs",
"statx",
"write",
"writev"
],
"action": "SCMP_ACT_ALLOW"
}
]
}
Network Policies in Kubernetes
Restrict pod-to-pod communication by default.
# Default deny all ingress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-ingress
spec:
podSelector: {}
policyTypes:
- Ingress
---
# Allow ingress from ingress controller
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-from-ingress
spec:
podSelector:
matchLabels:
app: myapp
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: ingress-nginx
ports:
- protocol: TCP
port: 3000
---
# Allow specific egress (outbound)
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-to-database
spec:
podSelector:
matchLabels:
app: myapp
policyTypes:
- Egress
egress:
# Allow DNS
- to:
- namespaceSelector: {}
ports:
- protocol: UDP
port: 53
# Allow to database
- to:
- podSelector:
matchLabels:
app: postgres
ports:
- protocol: TCP
port: 5432
# Allow to external API
- to:
- namespaceSelector: {}
ports:
- protocol: TCP
port: 443
Image Scanning With Trivy in CI
Scan images for known vulnerabilities before pushing to registry.
# .github/workflows/container-security.yaml
name: Container Security
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Build image
run: docker build -t myapp:${{ github.sha }} .
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
image-ref: myapp:${{ github.sha }}
format: 'sarif'
output: 'trivy-results.sarif'
severity: 'CRITICAL,HIGH'
- name: Upload Trivy results to GitHub Security
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: 'trivy-results.sarif'
- name: Fail on critical vulnerabilities
run: |
CRITICAL=$(docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \
aquasec/trivy:latest image --severity CRITICAL --exit-code 1 \
myapp:${{ github.sha }})
if [ $? -ne 0 ]; then
echo "Critical vulnerabilities found"
exit 1
fi
SBOM Generation
Generate Software Bill of Materials for compliance and vulnerability tracking.
# .github/workflows/sbom.yaml
name: Generate SBOM
on:
push:
branches: [main]
jobs:
sbom:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Build image
run: docker build -t myapp:${{ github.sha }} .
- name: Generate SBOM with Syft
uses: anchore/sbom-action@v0
with:
image: myapp:${{ github.sha }}
format: spdx-json
output-file: sbom-${{ github.sha }}.spdx.json
- name: Upload SBOM
uses: actions/upload-artifact@v3
with:
name: sbom
path: sbom-${{ github.sha }}.spdx.json
- name: Check for known vulnerabilities in SBOM
run: |
# Use grype to check SBOM against CVE database
grype sbom:sbom-${{ github.sha }}.spdx.json --fail-on high
Dockerfile for minimal SBOM:
# Build with syft annotation
FROM node:18 AS builder
LABEL "sbom.syft"="included"
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm install --production
COPY . .
RUN npm run build
FROM gcr.io/distroless/nodejs18-debian11
COPY /app/dist /app
COPY /app/node_modules /app/node_modules
WORKDIR /app
ENTRYPOINT ["node", "index.js"]
Container Security Checklist
## Container Security Audit Checklist
### Dockerfile Build-Time
- [ ] Non-root user specified with USER directive
- [ ] Base image is minimal (distroless or alpine)
- [ ] Multi-stage build used to exclude build artifacts
- [ ] No secrets in image layers (using BuildKit secrets if needed)
- [ ] No unnecessary packages installed
- [ ] Image scanned with Trivy, no high/critical vulnerabilities
- [ ] SBOM generated and tracked
### Kubernetes Runtime
- [ ] securityContext.runAsNonRoot: true
- [ ] securityContext.readOnlyRootFilesystem: true
- [ ] securityContext.allowPrivilegeEscalation: false
- [ ] securityContext.capabilities.drop: ["ALL"]
- [ ] Necessary capabilities added explicitly (rare)
- [ ] seccompProfile set to RuntimeDefault or custom
- [ ] Resource limits set (memory, CPU)
- [ ] NetworkPolicy restricts ingress/egress by default
- [ ] PSP (Pod Security Policy) or Pod Security Standards enforced
### Image Registry
- [ ] Registry requires authentication
- [ ] Images signed (cosign, Sigstore)
- [ ] Image push requires approval (only trusted CI/CD)
- [ ] Old images purged after retention period
- [ ] Registry scanned for vulnerabilities on pull
### Runtime Monitoring
- [ ] Audit logging enabled (kubectl logs)
- [ ] Container runtime (CRI) configured to log syscalls
- [ ] Alerts for privilege escalation attempts
- [ ] Alerts for unsigned container images deployed
Conclusion
Container security is defense in depth. At build time, use distroless images, multi-stage builds, and scan for vulnerabilities. At runtime, enforce non-root users, read-only filesystems, seccomp profiles, and network policies.
No single layer guarantees security. Every layer prevents one class of attack. Stack them together and build systems that are harder to compromise than they are to fix.