Published on

ArgoCD in Production — GitOps Deployment Pipelines That Actually Work

Authors

Introduction

ArgoCD brings GitOps discipline to Kubernetes: a Git repository becomes your source of truth, and ArgoCD reconciles cluster state to match that repository. But GitOps at scale requires careful architecture. How do you manage multiple environments? How do you order deployments safely? What happens when ArgoCD itself fails? This post covers production patterns: App of Apps, ApplicationSet, sync waves, health checks, RBAC, Argo Rollouts for progressive delivery, and disaster recovery for ArgoCD itself.

App of Apps Pattern

The App of Apps pattern uses a root ArgoCD Application that manages child Applications. This scales to hundreds of applications and environments.

Repository structure:

flux-repo/
├── apps/
│   ├── api/
│   │   ├── kustomization.yaml
│   │   ├── deployment.yaml
│   │   └── service.yaml
│   ├── worker/
│   │   ├── kustomization.yaml
│   │   └── deployment.yaml
│   └── postgres/
│       ├── helm/
│       └── values.yaml
├── argocd/
│   ├── root-app.yaml
│   ├── api-app.yaml
│   ├── worker-app.yaml
│   └── postgres-app.yaml
└── environments/
    ├── dev/
    │   └── kustomization.yaml
    ├── staging/
    │   └── kustomization.yaml
    └── prod/
        └── kustomization.yaml

Root Application (root-app.yaml):

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: root-app
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/org/flux-repo
    targetRevision: main
    path: argocd/
  destination:
    server: https://kubernetes.default.svc
    namespace: argocd
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
    - CreateNamespace=true

Child Application (api-app.yaml):

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: api
  namespace: argocd
  finalizers:
  - resources-finalizer.argocd.argoproj.io
spec:
  project: default
  source:
    repoURL: https://github.com/org/flux-repo
    targetRevision: main
    path: apps/api
    plugin:
      name: kustomize
  destination:
    server: https://kubernetes.default.svc
    namespace: default
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
    - CreateNamespace=true
  revisionHistoryLimit: 10

Root app contains references to all child apps. When you merge to main, ArgoCD detects changes and syncs recursively.

ApplicationSet for Multi-Environment

ApplicationSet generates Applications dynamically. Manage dev, staging, and prod with a single template:

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: api-environments
  namespace: argocd
spec:
  generators:
  - list:
      elements:
      - name: api
        env: dev
        cluster: dev-cluster
        namespace: default
      - name: api
        env: staging
        cluster: staging-cluster
        namespace: production
      - name: api
        env: prod
        cluster: prod-cluster
        namespace: production
  template:
    metadata:
      name: api-{{ env }}
      labels:
        app: api
        env: "{{ env }}"
    spec:
      project: default
      source:
        repoURL: https://github.com/org/flux-repo
        targetRevision: main
        path: apps/api
        kustomize:
          overlays:
          - overlays/{{ env }}
      destination:
        server: https://{{ cluster }}.example.com:443
        namespace: "{{ namespace }}"
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
        - CreateNamespace=true

This creates three Applications (api-dev, api-staging, api-prod) from a single template. Updating the template updates all environments.

Sync Waves and Hooks for Ordered Deployments

Sync waves ensure dependencies deploy in order. A wave is a deployment phase; ArgoCD waits for one wave to succeed before starting the next.

apiVersion: batch/v1
kind: Job
metadata:
  name: db-migrate
  annotations:
    argocd.argoproj.io/hook: PreSync
    argocd.argoproj.io/hook-sync-wave: "0"
    argocd.argoproj.io/hook-delete-policy: BeforeSyncFinished
spec:
  backoffLimit: 3
  template:
    spec:
      serviceAccountName: db-migrator
      restartPolicy: Never
      containers:
      - name: migrate
        image: myapp:v1.2.3
        command: ["/app/migrate.sh"]
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
  annotations:
    argocd.argoproj.io/hook-sync-wave: "1"
spec:
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
    spec:
      containers:
      - name: api
        image: myapp:v1.2.3
        ports:
        - containerPort: 8080
---
apiVersion: batch/v1
kind: Job
metadata:
  name: smoke-test
  annotations:
    argocd.argoproj.io/hook: PostSync
    argocd.argoproj.io/hook-sync-wave: "2"
    argocd.argoproj.io/hook-delete-policy: BeforeSyncFinished
spec:
  backoffLimit: 1
  template:
    spec:
      restartPolicy: Never
      containers:
      - name: test
        image: myapp:v1.2.3
        command: ["/app/smoke-test.sh"]

Execution order:

  1. Wave 0 (PreSync): Database migration runs
  2. Wave 1 (Sync): Deployment created
  3. Wave 2 (PostSync): Smoke tests run

Only when each wave succeeds does the next begin.

Health Checks and Custom Health

ArgoCD observes resource health (Running, Progressing, Healthy, Degraded). Define custom health rules:

apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-cm
  namespace: argocd
data:
  resource.customizations.health.certmanager.k8s.io_Certificate: |
    hs = {}
    if obj.status ~= nil then
      if obj.status.conditions ~= nil then
        for i, condition in ipairs(obj.status.conditions) do
          if condition.type == "Ready" and condition.status == "False" then
            hs.status = "Degraded"
            hs.message = condition.message
            return hs
          end
          if condition.type == "Ready" and condition.status == "True" then
            hs.status = "Healthy"
            hs.message = condition.message
            return hs
          end
        end
      end
    end
    hs.status = "Progressing"
    hs.message = "Waiting for certificate to be ready"
    return hs

This teaches ArgoCD how to assess cert-manager Certificate health.

RBAC with SSO Integration

Restrict access per team and environment:

apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-rbac-cm
  namespace: argocd
data:
  policy.csv: |
    p, role:admin, applications, get, */*, allow
    p, role:admin, applications, sync, */*, allow
    p, role:admin, repositories, get, *, allow
    p, role:admin, repositories, create, *, allow

    p, role:dev-team, applications, get, dev/*, allow
    p, role:dev-team, applications, sync, dev/*, allow
    p, role:dev-team, repositories, get, *, allow

    p, role:staging-team, applications, get, staging/*, allow
    p, role:staging-team, applications, sync, staging/*, allow
    p, role:staging-team, repositories, get, *, allow

    p, role:prod-oncall, applications, sync, prod/*, allow

    g, org:platform, role:admin
    g, org:data-eng, role:dev-team
    g, org:backend, role:staging-team
    g, oncall-team, role:prod-oncall

Map OIDC groups from your identity provider:

apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-cm
  namespace: argocd
data:
  oidc.config: |
    name: Okta
    issuer: https://org.okta.com
    clientID: $OIDC_CLIENT_ID
    clientSecret: $OIDC_CLIENT_SECRET
    requestedScopes:
    - openid
    - profile
    - email
    - groups
    claimsMapping:
      groups: groups

Progressive Delivery with Argo Rollouts

Argo Rollouts replace Deployments with Rollout objects, supporting canary, blue-green, and traffic-weighted deployments:

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: api
spec:
  replicas: 5
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
    spec:
      containers:
      - name: api
        image: myapp:v1.2.3
        ports:
        - containerPort: 8080
        resources:
          requests:
            cpu: 250m
            memory: 256Mi
          limits:
            cpu: 500m
            memory: 512Mi
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - api
              topologyKey: kubernetes.io/hostname
  strategy:
    canary:
      steps:
      - setWeight: 10
      - pause:
          duration: 5m
      - setWeight: 25
      - pause:
          duration: 5m
      - setWeight: 50
      - pause:
          duration: 5m
      - setWeight: 75
      - pause:
          duration: 5m
      trafficWeight:
        canary:
          weight: 50
        stable:
          weight: 50
      analysis:
        templates:
        - name: success-rate
        interval: 30s
        threshold: 5
        startingStep: 1
---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate
spec:
  metrics:
  - name: success-rate
    interval: 30s
    failureLimit: 3
    provider:
      prometheus:
        address: http://prometheus:9090
        query: |
          sum(rate(api_requests_total{status="200"}[5m])) / sum(rate(api_requests_total[5m]))
    thresholdResults:
      min: "0.95"

This canary gradually shifts traffic to the new version. If success rate falls below 95%, it aborts and rolls back.

Image Updater for Automated Image Promotion

ArgoCD can watch container registries and auto-update image tags:

apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-image-updater-config
  namespace: argocd
data:
  registries.conf: |
    registries:
    - name: Docker Hub
      api_url: https://registry-1.docker.io
      ping: yes
      credentials: secret:argocd/docker-hub#creds
    - name: ECR
      api_url: https://ecr.us-east-1.amazonaws.com
      ping: no
      credentials: secret:argocd/ecr#creds

Annotate your Kustomization:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
metadata:
  annotations:
    argocd-image-updater.argoproj.io/image-list: myapp=docker.io/org/myapp
    argocd-image-updater.argoproj.io/myapp.update-strategy: latest
    argocd-image-updater.argoproj.io/write-back-method: git:secret:argocd/image-updater-token
    argocd-image-updater.argoproj.io/git-branch: main
images:
- name: myapp
  newName: docker.io/org/myapp
  newTag: v1.2.3

When a new image is pushed, Image Updater bumps the tag and commits to Git. ArgoCD syncs the change.

Disaster Recovery for ArgoCD

ArgoCD is mission-critical infrastructure. Plan for its failure:

Backup ArgoCD state:

# Backup all ArgoCD data
kubectl exec -it argocd-server-0 -n argocd -- \
  argocd-util export > argocd-backup.yaml

# Store in version control
git add argocd-backup.yaml && git commit -m "ArgoCD backup"

Recovery playbook:

# 1. Reinstall ArgoCD
helm repo add argo https://argoproj.github.io/argo-helm
helm install argocd argo/argo-cd \
  -n argocd \
  --create-namespace \
  -f values.yaml

# 2. Restore state
kubectl create -f argocd-backup.yaml

# 3. Re-sync Applications
argocd app sync --all

# 4. Verify health
argocd app list

High availability setup:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: argocd-server
spec:
  serviceName: argocd-server
  replicas: 3
  selector:
    matchLabels:
      app.kubernetes.io/name: argocd-server
  template:
    metadata:
      labels:
        app.kubernetes.io/name: argocd-server
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app.kubernetes.io/name
                operator: In
                values:
                - argocd-server
            topologyKey: kubernetes.io/hostname
      containers:
      - name: argocd-server
        image: quay.io/argoproj/argocd:v2.9.0
        command:
        - argocd-server
        - --insecure
        - --disable-auth

Checklist

  • App of Apps pattern deployed with root-app managing all child Applications
  • ApplicationSet configured for multi-environment deployments (dev, staging, prod)
  • Sync waves defined for ordered resource deployment (migrations → apps → tests)
  • Health checks configured for custom resource types (CRDs, cert-manager, etc.)
  • RBAC policies restrict access by team and environment
  • SSO/OIDC integrated for centralized identity management
  • Argo Rollouts deployed for canary/blue-green progressive delivery
  • Automated image promotions via Image Updater
  • ArgoCD HA deployment with multiple replicas
  • Backup and disaster recovery procedures documented and tested
  • Git repository is single source of truth; no manual kubectl apply
  • Notifications configured for sync failures and drift detected

Conclusion

ArgoCD transforms deployment from imperative scripting to declarative GitOps. The App of Apps pattern scales to hundreds of applications. ApplicationSet eliminates environment duplication. Sync waves ensure safe, ordered rollouts. Progressive delivery with Argo Rollouts catches regressions before affecting all users. Finally, treat ArgoCD as critical infrastructure: back it up, run it with redundancy, and practice disaster recovery regularly. With this foundation, your deployments become auditable, reproducible, and automated.