Deployment Strategies

Shipping software is not flipping a switch—it is a sequence of reversible decisions. This chapter covers Kubernetes rollout primitives, feature flags, progressive delivery with metric gates, safe database migrations, release promotion across environments, and multi-layer rollback playbooks that keep change failure rate low.

developer devops security DORA GitOps

Deployment Strategies

A deployment strategy answers one question: how do we replace running software with a new version without taking the service down—and without betting the business on a single atomic switch? Kubernetes native primitives (RollingUpdate, Recreate) cover basics; production teams layer blue/green, canary, and GitOps promotion on top.

Strategy comparison

Strategy Downtime Rollback speed Resource cost Best for
Rolling update None (if probes correct) Minutes (roll back image tag) Low — one extra surge pod Stateless APIs, default K8s path
Recreate Yes — all pods terminated first Fast redeploy Lowest Dev/staging, jobs that cannot run two versions
Blue/green None at switch Instant (flip Service selector) 2× capacity during cutover Regulated releases, schema-compatible versions
Canary None Traffic shift or image revert Partial duplicate stack High-traffic services with metric gates
Feature flags None Toggle off (seconds) SDK + flag service Long-lived branches, A/B experiments

Kubernetes RollingUpdate mechanics

A Deployment controller owns ReplicaSets—one active, older ones scaled to zero for rollback history. spec.strategy.type: RollingUpdate with maxUnavailable and maxSurge controls how aggressively pods are replaced. The kubelet does not know about rollouts; it only starts/stops containers the ReplicaSet tells it to run.

yaml — production rolling update
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payments-api
  namespace: production
  labels:
    app: payments-api
    track: stable
spec:
  replicas: 6
  revisionHistoryLimit: 10
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 2
  selector:
    matchLabels:
      app: payments-api
  template:
    metadata:
      labels:
        app: payments-api
        version: "2.8.1"
    spec:
      terminationGracePeriodSeconds: 45
      containers:
        - name: api
          image: registry.example.com/payments-api@sha256:abc123...
          readinessProbe:
            httpGet:
              path: /readyz
              port: 8080
            periodSeconds: 5
            failureThreshold: 3
          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh", "-c", "sleep 15"]
sequenceDiagram
  participant Dev as Developer
  participant CI as CI pipeline
  participant Reg as Container registry
  participant API as kube-apiserver
  participant RS as ReplicaSet controller
  participant KL as kubelet
  Dev->>CI: merge to main
  CI->>Reg: push image digest
  CI->>API: kubectl set image / GitOps sync
  API->>RS: new ReplicaSet created
  RS->>KL: create pod v2 (surge)
  KL-->>RS: readiness OK
  RS->>KL: terminate pod v1
  RS-->>Dev: rollout status complete
🔬 Under the Hood

The Deployment controller compares desired replicas from the new ReplicaSet template against observed ready replicas. If readiness probes lie—returning 200 before the app can serve traffic—RollingUpdate will drain healthy v1 pods while v2 pods accept traffic they cannot handle. Readiness gates rollout safety.

⚠️ Pitfall

Setting maxUnavailable: 50% on a two-replica Deployment allows both pods to terminate simultaneously during a bad image push. Combine conservative surge/unavailable with PodDisruptionBudget minimum availability.

Blue/green on Kubernetes

Run two Deployments—payments-api-blue and payments-api-green—behind one Service. Cutover is a label selector patch or Ingress weight change. Rollback is flipping back; old stack stays warm until you decommission it.

.github/workflows/blue-green.yml
name: Blue-green deploy
on:
  workflow_dispatch:
    inputs:
      color:
        description: Target stack (blue|green)
        required: true
jobs:
  deploy:
    runs-on: ubuntu-latest
    environment: production
    steps:
      - uses: actions/checkout@v4
      - name: Deploy target color
        run: |
          kubectl set image deployment/payments-api-${{ inputs.color }} \
            api=registry.example.com/payments-api:${{ github.sha }}
          kubectl rollout status deployment/payments-api-${{ inputs.color }} -n production
      - name: Switch Service selector
        run: |
          kubectl patch service payments-api -n production -p \
            '{"spec":{"selector":{"color":"${{ inputs.color }}"}}}'
.gitlab-ci.yml — blue-green stage
deploy-blue:
  stage: deploy
  environment: production
  script:
    - kubectl set image deployment/payments-api-blue api=$CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
    - kubectl rollout status deployment/payments-api-blue -n production
  when: manual

switch-to-blue:
  stage: deploy
  environment: production
  script:
    - kubectl patch service payments-api -n production -p '{"spec":{"selector":{"color":"blue"}}}'
  needs: [deploy-blue]
  when: manual

Feature Flags & Decoupled Release

Deploying code and releasing features are different events. Feature flags let you ship dark code to production, enable it for internal testers, then ramp percentage—without a second deployment. Flags complement canary traffic splitting when behavior is gated in application logic.

Flag types

TypePurposeRollback
Release toggleHide incomplete features until readyDisable flag — no redeploy
Ops kill switchDisable expensive code path under loadInstant — SRE playbook
ExperimentA/B test conversion metricsRevert cohort assignment
PermissionEntitlement / plan gatingConfig change in flag service

Architecture

SDK in app polls or streams flag state from LaunchDarkly, Unleash, Flagsmith, or a ConfigMap-backed open-source server. Evaluate flags server-side for security-sensitive behavior; client-side flags leak intent in JavaScript bundles.

yaml — Unleash feature definition (GitOps)
apiVersion: v1
kind: ConfigMap
metadata:
  name: unleash-features
  namespace: platform
data:
  new-checkout-flow.json: |
    {
      "name": "new-checkout-flow",
      "description": "Stripe Elements v2 checkout",
      "type": "release",
      "enabled": false,
      "strategies": [
        {
          "name": "flexibleRollout",
          "parameters": { "rollout": "25", "stickiness": "userId" }
        }
      ]
    }

Flags vs canary traffic

Canary routes HTTP requests to different pod versions—whole binary changes. Feature flags route logic inside one binary. Use both: canary validates infra and latency; flags validate product behavior per user cohort.

.github/workflows/flag-gated-deploy.yml
name: Deploy with flag default-off
on:
  push:
    branches: [main]
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Build and push
        run: |
          docker build -t registry.example.com/checkout:${{ github.sha }} .
          docker push registry.example.com/checkout:${{ github.sha }}
      - name: Sync deployment
        run: kubectl set image deployment/checkout checkout=registry.example.com/checkout:${{ github.sha }}
      - name: Ensure flag disabled
        env:
          UNLEASH_TOKEN: ${{ secrets.UNLEASH_ADMIN_TOKEN }}
        run: |
          curl -sf -X POST "$UNLEASH_URL/api/admin/projects/default/features/new-checkout-flow/environments/production/off" \
            -H "Authorization: $UNLEASH_TOKEN"
.gitlab-ci.yml — flag gate
deploy-checkout:
  stage: deploy
  script:
    - docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA .
    - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
    - kubectl set image deployment/checkout checkout=$CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
    - |
      curl -sf -X POST "$UNLEASH_URL/api/admin/projects/default/features/new-checkout-flow/environments/production/off" \
        -H "Authorization: $UNLEASH_ADMIN_TOKEN"
  environment: production
💡 Pro Tip

Every flag needs an owner and expiry date. Permanent flags become undeletable conditional branches—exactly the complexity trunk-based development tries to eliminate.

⚖️ Trade-off

Managed flag services add cost and network dependency. ConfigMap-backed flags are free but lack real-time analytics and sticky cohort targeting at scale.

Progressive Delivery & Metric Gates

Progressive delivery extends continuous deployment with automated promotion gates—error rate, latency p99, business KPIs—before increasing traffic share. Argo Rollouts, Flagger, and service mesh traffic splitting implement this on Kubernetes.

Argo Rollouts canary

Argo Rollouts replaces Deployment for fine-grained steps: 10% → 30% → 60% → 100%, pausing for AnalysisRun success between steps. Failed analysis triggers automatic rollback.

yaml — Rollout with Prometheus analysis
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: payments-api
  namespace: production
spec:
  replicas: 10
  strategy:
    canary:
      steps:
        - setWeight: 10
        - pause: { duration: 5m }
        - analysis:
            templates:
              - templateName: success-rate
        - setWeight: 30
        - pause: { duration: 5m }
        - setWeight: 60
        - pause: { duration: 10m }
        - setWeight: 100
      trafficRouting:
        istio:
          virtualService:
            name: payments-api
            routes:
              - primary
  selector:
    matchLabels:
      app: payments-api
  template:
    metadata:
      labels:
        app: payments-api
    spec:
      containers:
        - name: api
          image: registry.example.com/payments-api:2.9.0
---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate
  namespace: production
spec:
  metrics:
    - name: error-rate
      interval: 1m
      count: 5
      successCondition: result[0] < 0.01
      provider:
        prometheus:
          address: http://prometheus.monitoring:9090
          query: |
            sum(rate(http_requests_total{app="payments-api",status=~"5.."}[2m]))
            / sum(rate(http_requests_total{app="payments-api"}[2m]))
flowchart LR
  CI["CI builds digest"] --> Sync["GitOps sync"]
  Sync --> R10["10% canary"]
  R10 --> A1{"Analysis OK?"}
  A1 -->|yes| R30["30% traffic"]
  A1 -->|no| RB["Auto rollback"]
  R30 --> A2{"Analysis OK?"}
  A2 -->|yes| R100["100% promote"]
  A2 -->|no| RB
.github/workflows/progressive.yml
name: Trigger progressive rollout
on:
  workflow_run:
    workflows: [CI]
    types: [completed]
jobs:
  promote:
    if: ${{ github.event.workflow_run.conclusion == 'success' }}
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Update image in GitOps repo
        run: |
          yq -i '.spec.template.spec.containers[0].image = "registry.example.com/api:${{ github.event.workflow_run.head_sha }}"' \
            deploy/overlays/production/rollout.yaml
          git config user.email "[email protected]"
          git commit -am "promote ${{ github.event.workflow_run.head_sha }}"
          git push
.gitlab-ci.yml — GitOps promote
promote-production:
  stage: deploy
  image: bitnami/git:latest
  script:
    - git clone $GITOPS_REPO gitops && cd gitops
    - yq -i '.spec.template.spec.containers[0].image = env(CI_REGISTRY_IMAGE) + ":" + env(CI_COMMIT_SHA)' deploy/production/rollout.yaml
    - git commit -am "promote $CI_COMMIT_SHA"
    - git push origin main
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
  environment: production
🛡️ Security gate

Progressive delivery does not replace policy admission. Image must pass Cosign verification and Kyverno digest-pin policies before the Rollout controller creates canary pods. Security gates at deploy time; metric gates at traffic shift time.

🌍 Real World

Teams that skip analysis templates and manually promote after a 5-minute pause are doing timed canary, not progressive delivery. Wire at least error rate and saturation metrics—or you are guessing.

Database Migrations in Deploy Pipelines

Schema changes are the hardest part of zero-downtime deploys. Application rollouts assume expand/contract compatibility: new code must run against old schema, old code against new schema—during the overlap window.

Expand/contract pattern

  1. Expand — add nullable column or new table; deploy code that writes both old and new
  2. Migrate data — backfill job; dual-read validation
  3. Contract — remove old column after all code reads new path

Never drop a column in the same release that stops reading it. Production overlap can last hours during slow rollouts.

Migration execution models

ModelWhenRisk
Pre-deploy JobBefore new pods startBlocks deploy if migration fails — safe default
Init containerPer pod startupRace on concurrent schema lock — use advisory locks
App startupORM auto-migrateForbidden in production — no audit trail
Manual DBA windowLarge index buildsRequires maintenance window communication
yaml — Flyway pre-deploy Kubernetes Job
apiVersion: batch/v1
kind: Job
metadata:
  name: flyway-migrate-2-9-0
  namespace: production
  annotations:
    argocd.argoproj.io/hook: PreSync
    argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
  backoffLimit: 0
  template:
    spec:
      restartPolicy: Never
      serviceAccountName: migrate-runner
      containers:
        - name: flyway
          image: flyway/flyway:10
          args:
            - migrate
            - -url=jdbc:postgresql://payments-db:5432/payments
            - -user=$(DB_USER)
            - -password=$(DB_PASSWORD)
            - -locations=filesystem:/sql
          envFrom:
            - secretRef:
                name: payments-db-credentials
          volumeMounts:
            - name: sql
              mountPath: /sql
      volumes:
        - name: sql
          configMap:
            name: flyway-v2-9-0
.github/workflows/migrate-then-deploy.yml
name: Migrate then deploy
on:
  push:
    branches: [main]
jobs:
  migrate:
    runs-on: ubuntu-latest
    environment: production
    steps:
      - uses: actions/checkout@v4
      - name: Run Flyway
        run: |
          kubectl delete job flyway-migrate-${{ github.sha }} -n production --ignore-not-found
          kubectl apply -f k8s/jobs/flyway-${{ github.sha }}.yaml
          kubectl wait --for=condition=complete job/flyway-migrate-${{ github.sha }} -n production --timeout=600s
  deploy:
    needs: migrate
    runs-on: ubuntu-latest
    steps:
      - name: Sync application
        run: kubectl rollout restart deployment/payments-api -n production
.gitlab-ci.yml — migration gate
db-migrate:
  stage: deploy
  environment: production
  script:
    - kubectl apply -f k8s/jobs/flyway-$CI_COMMIT_SHA.yaml
    - kubectl wait --for=condition=complete job/flyway-migrate-$CI_COMMIT_SHA -n production --timeout=600s
  rules:
    - if: $CI_COMMIT_BRANCH == "main"

app-deploy:
  stage: deploy
  needs: [db-migrate]
  script:
    - kubectl rollout restart deployment/payments-api -n production
  environment: production
⚠️ Pitfall

Long-running migrations inside PreSync hooks block Argo CD sync—and every subsequent deploy. Split additive migrations (fast) from backfills (async Job) from destructive drops (separate release).

Release Management & Environments

Release management connects version semantics, change approval, environment promotion, and audit trails. GitOps makes the desired state in git the release artifact; tags and changelogs remain the human contract with stakeholders.

Promotion pipeline

Typical flow: dev (every merge) → staging (nightly or RC tag) → production (manual approval or automated after soak). Each hop is a Kustomize overlay or Helm values change—not a rebuilt image.

flowchart LR
  subgraph git["Git repository"]
    Base["base/"]
    Dev["overlays/dev"]
    Stg["overlays/staging"]
    Prd["overlays/production"]
  end
  CI["CI builds immutable digest"] --> Reg["Registry"]
  Reg --> Dev
  Dev -->|"auto sync"| DevCls["dev cluster"]
  Stg -->|"RC tag"| StgCls["staging cluster"]
  Prd -->|"approval + soak"| PrdCls["production cluster"]
yaml — Kustomize production overlay (image digest pin)
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: production
resources:
  - ../../base
images:
  - name: registry.example.com/payments-api
    newName: registry.example.com/payments-api
    digest: sha256:abc123def456...
commonAnnotations:
  release.sharpbyte.dev/version: "2.9.0"
  release.sharpbyte.dev/changelog: "https://github.com/org/payments/releases/tag/v2.9.0"
patches:
  - patch: |-
      - op: replace
        path: /spec/replicas
        value: 12
    target:
      kind: Deployment
      name: payments-api
.github/workflows/release.yml
name: Release
on:
  push:
    tags: ['v*']
permissions:
  contents: write
jobs:
  release:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: softprops/action-gh-release@v2
        with:
          generate_release_notes: true
      - name: Promote digest to staging
        run: |
          cd gitops/overlays/staging
          yq -i '.images[0].digest = "${{ needs.build.outputs.digest }}"' kustomization.yaml
          git commit -am "staging: ${{ github.ref_name }}"
          git push
.gitlab-ci.yml — release tag
create-release:
  stage: release
  image: registry.gitlab.com/gitlab-org/release-cli:latest
  rules:
    - if: $CI_COMMIT_TAG
  script:
    - release-cli create --name "$CI_COMMIT_TAG" --tag-name "$CI_COMMIT_TAG" --description "Auto changelog"
  release:
    tag_name: $CI_COMMIT_TAG
    description: "Release $CI_COMMIT_TAG"

promote-staging:
  stage: deploy
  rules:
    - if: $CI_COMMIT_TAG
  script:
    - yq -i '.images[0].digest = env(IMAGE_DIGEST)' gitops/overlays/staging/kustomization.yaml
    - git commit -am "staging $CI_COMMIT_TAG" && git push
💡 Pro Tip

Pin digest in production overlays, not floating tags. Tags are mutable; digests are the SLSA-provable identity of what ran.

Rollback Strategies

Rollback is restoring known-good behavior faster than MTTR SLO allows. Kubernetes keeps ReplicaSet history; GitOps keeps git history; feature flags give instant logic rollback—use the right layer for the failure mode.

Rollback layers

LayerCommand / actionSpeedScope
Feature flagDisable in flag consoleSecondsLogic only
K8s rollout undokubectl rollout undo1–5 minWorkload image
GitOps revertgit revert + sync2–10 minFull manifest set
Traffic shiftRoute 100% to stable colorSecondsIngress/mesh
DB rollbackForward-fix migrationHoursSchema — avoid reverse migrations
terminal — incident rollback
$ kubectl rollout history deployment/payments-api -n production
$ kubectl rollout undo deployment/payments-api -n production --to-revision=42
$ kubectl rollout status deployment/payments-api -n production
→ deployment "payments-api" successfully rolled out$ oc rollout history deployment/payments-api -n production
$ oc rollout undo deployment/payments-api -n production
$ oc adm rollback deployment/payments-api -n production --to-revision=42
.github/workflows/rollback.yml
name: Emergency GitOps rollback
on:
  workflow_dispatch:
    inputs:
      revert_sha:
        description: Commit SHA to restore
        required: true
jobs:
  rollback:
    runs-on: ubuntu-latest
    environment: production-emergency
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0
      - name: Revert GitOps commit
        run: |
          git revert --no-edit ${{ inputs.revert_sha }}
          git push origin main
      - name: Wait for Argo sync
        run: argocd app wait payments-api --health --timeout 600
.gitlab-ci.yml — emergency rollback
emergency-rollback:
  stage: deploy
  when: manual
  environment: production-emergency
  variables:
    REVERT_SHA: ""
  script:
    - git revert --no-edit $REVERT_SHA
    - git push origin main
    - argocd app wait payments-api --health --timeout 600
🎯 Interview angle

When asked "how do you deploy safely?", structure answer: strategy (rolling/canary) → gates (tests, scans, policies) → observability (metrics during progressive delivery) → rollback (multi-layer, practiced in game days). Connect to DORA change failure rate and MTTR.

⚖️ Trade-off

kubectl rollout undo restores old ReplicaSet but not incompatible database state. Rollback runbooks must state which layers are safe together.