Policy as Code

Spreadsheets cannot keep pace with Kubernetes. This chapter explains why policy belongs in git, how OPA and Rego work, Gatekeeper and Kyverno admission policies, Conftest CI gates, compliance evidence automation, and tying policy metrics to DORA performance.

developer devops security IaC OPA

Why Policy as Code?

Manual compliance checklists do not survive 200 deploys per week. Policy as Code encodes organizational guardrails as executable rules—evaluated on every PR, Terraform plan, and Kubernetes admission request—before non-compliant resources reach production.

Policy evaluation points

StageEngineBlocks
CI — Terraform planOPA / Conftest / SentinelPublic S3, wildcard IAM
CI — Kubernetes manifestConftest / Kyverno CLIMissing limits, latest tag
Admission — K8s APIKyverno / GatekeeperNon-compliant live deploy
Runtime — service meshOPA Envoy pluginUnauthorized east-west call

Policies should be owned by platform security, versioned in git, tested with fixture inputs, and exempted via documented break-glass annotations—not Slack DMs to disable gates.

flowchart LR
  Dev["Developer PR"] --> CI["CI policy scan"]
  CI -->|pass| Merge["Merge"]
  CI -->|fail| Fix["Fix or request exception"]
  Merge --> GitOps["GitOps sync"]
  GitOps --> Admit["K8s admission policy"]
  Admit -->|pass| Live["Production workload"]
  Admit -->|fail| Reject["HTTP 422 rejected"]
🔒 Security

Policy without metrics is blind compliance. Pair deny rules with dashboards: count of blocked deployments, top violating teams, mean time to policy fix.

Policy maturity model

LevelBehaviorExample
0 — Ad hocManual checklist at releaseSpreadsheet SOC2 controls
1 — CI warnScanners report, do not blockCheckov soft-fail
2 — CI enforcePR blocked on violationConftest deny on plan
3 — Admission enforceCluster rejects non-compliantKyverno Enforce mode
4 — Continuous auditBackground scan + metricsGatekeeper audit + PolicyReport

Policy ownership RACI

  • Platform security — authors ClusterPolicy / ConstraintTemplate
  • App teams — fix violations in their namespaces
  • SRE — operates admission webhook HA and latency SLOs
  • Compliance — maps policies to framework controls

Open Policy Agent (OPA)

OPA is a general-purpose policy engine using Rego—a declarative query language over JSON. One OPA deployment evaluates Terraform plans, Kubernetes admission reviews, API authorization, and microservice sidecar policies.

rego — deny public S3 ACL
package terraform.s3

import future.keywords.in

deny[msg] {
    some resource in input.resource_changes
    resource.type == "aws_s3_bucket"
    resource.change.after.acl == "public-read"
    msg := sprintf("S3 bucket %s must not be public-read", [resource.address])
}

deny[msg] {
    some resource in input.resource_changes
    resource.type == "aws_s3_bucket_public_access_block"
    not resource.change.after.block_public_acls
    msg := sprintf("S3 %s must block public ACLs", [resource.address])
}
rego — Deployment must have resource limits
package kubernetes.deployments

deny[msg] {
    input.kind == "Deployment"
    container := input.spec.template.spec.containers[_]
    not container.resources.limits.cpu
    msg := sprintf("container %s missing CPU limit", [container.name])
}

deny[msg] {
    input.kind == "Deployment"
    container := input.spec.template.spec.containers[_]
    not container.resources.limits.memory
    msg := sprintf("container %s missing memory limit", [container.name])
}
.github/workflows/opa-conftest.yml
name: OPA Conftest
on: [pull_request]
jobs:
  conftest:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: go install github.com/open-policy-agent/conftest@latest
      - name: Scan K8s manifests
        run: conftest test -p policy/kubernetes deploy/ --all-namespaces
      - name: Scan Terraform plan
        run: |
          terraform show -json plan.tfplan > plan.json
          conftest test -p policy/terraform plan.json
.gitlab-ci.yml
conftest-k8s:
  stage: test
  image: openpolicyagent/conftest:latest
  script:
    - conftest test -p policy/kubernetes deploy/ --all-namespaces
  rules:
    - changes:
        - deploy/**/*
        - policy/**/*

conftest-tf:
  stage: test
  script:
    - terraform show -json plan.tfplan > plan.json
    - conftest test -p policy/terraform plan.json
  needs: [terraform-plan]
🔬 Under the Hood

OPA evaluates input against rules that define deny sets. If any deny rule matches, decision is reject. Test with opa test policy/ -v.

OPA deployment patterns

PatternUse caseLatency
SidecarEnvoy ext_authz per podLow — local
Centralized serviceAPI authorizationMedium — network hop
Bundle + hot reloadK8s admission via GatekeeperLow — in-process
CI-only (Conftest)Shift-left without runtime OPAN/A in prod

Testing Rego

Every policy package needs *_test.rego with table-driven cases. Run opa test -v policy/ in CI on every policy PR—Rego bugs are logic bugs, not syntax typos.

rego — test case for deployment limits
package kubernetes.deployments

test_deny_missing_cpu {
    input := {
        "kind": "Deployment",
        "spec": {
            "template": {
                "spec": {
                    "containers": [{
                        "name": "api",
                        "resources": {"limits": {"memory": "512Mi"}}
                    }]
                }
            }
        }
    }
    count(deny) == 1 with input as input
}

test_allow_with_limits {
    input := {
        "kind": "Deployment",
        "spec": {
            "template": {
                "spec": {
                    "containers": [{
                        "name": "api",
                        "resources": {
                            "limits": {"cpu": "500m", "memory": "512Mi"}
                        }
                    }]
                }
            }
        }
    }
    count(deny) == 0 with input as input
}

OPA Gatekeeper on Kubernetes

Gatekeeper is Kubernetes-native OPA—ConstraintTemplate CRDs define Rego; Constraint CRs parameterize and enable it. Validating admission webhook rejects non-compliant resources at API server.

yaml — Gatekeeper ConstraintTemplate
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8srequiredlabels
spec:
  crd:
    spec:
      names:
        kind: K8sRequiredLabels
      validation:
        openAPIV3Schema:
          type: object
          properties:
            labels:
              type: array
              items:
                type: string
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8srequiredlabels
        violation[{"msg": msg}] {
          required := input.parameters.labels
          provided := input.review.object.metadata.labels
          missing := required[_]
          not provided[missing]
          msg := sprintf("missing required label: %v", [missing])
        }
yaml — Gatekeeper Constraint
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
  name: platform-mandatory-labels
spec:
  match:
    kinds:
      - apiGroups: ["apps"]
        kinds: ["Deployment", "StatefulSet"]
    namespaces:
      - production
  parameters:
    labels:
      - app.kubernetes.io/name
      - app.kubernetes.io/version
      - owner.team
terminal — Gatekeeper violations
$ kubectl get constraints
$ kubectl get k8srequiredlabels.constraints.gatekeeper.sh platform-mandatory-labels -o yaml
→ status.totalViolations: 3
$ kubectl logs -n gatekeeper-system deploy/gatekeeper-audit$ oc get constraints
$ oc describe k8srequiredlabels platform-mandatory-labels

Gatekeeper vs Kyverno

AspectGatekeeperKyverno
Policy languageRegoYAML patterns + CEL
MutateLimited via mutation CRDsNative mutate rules
Image verifyVia external dataBuilt-in verifyImages
Learning curveSteeperK8s-native
⚖️ Trade-off

Gatekeeper Rego is powerful but steep for app teams. Platform owns templates; product teams only supply Constraint parameters—or migrate simple rules to Kyverno YAML.

Audit vs enforce

Start new constraints in audit mode—violations logged without blocking deploys. Gatekeeper audit controller writes violations to status; export to SIEM before flipping to deny.

ModeEffectWhen
AuditLog violation, allow createPolicy rollout week 1–2
DryrunWebhook simulates without persistTesting template changes
EnforceHTTP 422 rejectAfter fix window + comms

Kyverno Policies

Kyverno policies are Kubernetes resources—no Rego required. Validate, mutate, generate, and cleanup rules use YAML patterns familiar to platform engineers.

yaml — Kyverno: require digest, deny latest
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-image-digest
spec:
  validationFailureAction: Enforce
  background: true
  rules:
    - name: disallow-latest-tag
      match:
        any:
          - resources:
              kinds:
                - Pod
      validate:
        message: "Using 'latest' tag is not allowed."
        pattern:
          spec:
            containers:
              - image: "!*:latest"
    - name: require-digest
      match:
        any:
          - resources:
              kinds:
                - Pod
      validate:
        message: "Images must use digest pinning (@sha256:...)."
        pattern:
          spec:
            containers:
              - image: "*@sha256:*"
yaml — Kyverno: verify Cosign signature
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: verify-image-signatures
spec:
  validationFailureAction: Enforce
  webhookTimeoutSeconds: 30
  rules:
    - name: verify-signature
      match:
        any:
          - resources:
              kinds:
                - Pod
      verifyImages:
        - imageReferences:
            - "registry.example.com/*"
          attestors:
            - count: 1
              entries:
                - keys:
                    publicKeys: |-
                      -----BEGIN PUBLIC KEY-----
                      MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE...
                      -----END PUBLIC KEY-----
.github/workflows/kyverno-ci.yml
name: Kyverno policy test
on:
  pull_request:
    paths: ['policy/kyverno/**', 'deploy/**']
jobs:
  kyverno:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: kyverno/[email protected]
      - run: kyverno apply policy/kyverno/ --resource deploy/ --policy-report
      - run: kyverno test policy/kyverno/tests/
.gitlab-ci.yml
kyverno-test:
  stage: test
  image: ghcr.io/kyverno/kyverno-cli:v1.12.0
  script:
    - kyverno apply policy/kyverno/ --resource deploy/ --policy-report
    - kyverno test policy/kyverno/tests/
  rules:
    - changes:
        - policy/kyverno/**/*
        - deploy/**/*
💡 Pro Tip

Run kyverno apply in CI against rendered manifests (Kustomize/Helm output)—catch violations before Argo CD sync surfaces opaque admission errors.

yaml — Kyverno mutate: add default resources
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: add-default-resources
spec:
  rules:
    - name: default-requests-limits
      match:
        any:
          - resources:
              kinds: [Deployment]
      mutate:
        patchStrategicMerge:
          spec:
            template:
              spec:
                containers:
                  - (name): "*"
                    resources:
                      requests:
                        +(cpu): "100m"
                        +(memory): "128Mi"
                      limits:
                        +(memory): "512Mi"

Policy exceptions

Kyverno PolicyException grants time-bound waivers for break-glass deploys— requires security approval annotation and expiry. Permanent exceptions belong in policy revision, not exceptions CRD.

Performance considerations

  • verifyImages calls registry—set reasonable webhookTimeoutSeconds
  • Background scans run async—do not rely on them for deploy-time safety
  • Keep match rules narrow—cluster-wide Pod matches are expensive at scale

Conftest & CI Policy Gates

Conftest wraps OPA for CI-friendly testing of JSON/YAML/HCL plans. Write policies once in policy/; test with fixture files; fail builds before merge.

Policy repository layout

policy/
├── kubernetes/
│   ├── deployments.rego
│   └── tests/
│       └── missing_limits_test.rego
├── terraform/
│   ├── s3.rego
│   └── tests/
└── docker/
    └── dockerfile.rego

Testing policies

Each Rego package should have *_test.rego with positive and negative cases. CI runs conftest test and opa test on every policy change—policies are code and deserve the same review rigor as application logic.

rego — deny latest image tag
package main

deny[msg] {
  input.kind == "Deployment"
  container := input.spec.template.spec.containers[_]
  endswith(container.image, ":latest")
  msg := sprintf("container %v uses :latest tag", [container.name])
}
.github/workflows/conftest-sarif.yml
name: Conftest SARIF
on: [pull_request]
jobs:
  policy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: go install github.com/open-policy-agent/conftest@latest
      - name: Test policies
        run: conftest test -p policy/kubernetes deploy/ -o sarif > conftest.sarif
      - uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: conftest.sarif
.gitlab-ci.yml
conftest:
  stage: test
  image: openpolicyagent/conftest:latest
  script:
    - conftest test -p policy/kubernetes deploy/ -o junit > conftest.xml
  artifacts:
    reports:
      junit: conftest.xml
📦 Real World

Upload Conftest SARIF to GitHub/GitLab security dashboards—security champions see policy failures alongside SAST/SCA in one PR view.

Compliance Frameworks & Evidence

SOC 2, PCI-DSS, HIPAA, and ISO 27001 ask for demonstrable controls—not screenshots. Policy-as-code generates continuous evidence: every denied public bucket, every enforced digest pin, every audit log entry is a compliance datapoint.

Control mapping example

Framework controlAutomated policyEvidence source
SOC 2 CC6.1 — logical accessKyverno: require IRSA annotationPolicyReport CRD export
PCI 1.2 — network segmentationTerraform OPA: private subnets onlyConftest plan output in CI artifacts
PCI 3.4 — encryption at restCheckov CKV_AWS_19Checkov JUnit report
HIPAA — audit loggingTerraform: CloudTrail requiredTerraform state + AWS Config
yaml — Kyverno PolicyReport (audit evidence)
apiVersion: wgpolicyk8s.io/v1alpha2
kind: PolicyReport
metadata:
  name: ns-production-audit
  namespace: production
results:
  - policy: require-image-digest
    rule: require-digest
    result: pass
    scored: true
    resource:
      kind: Deployment
      name: payments-api
      namespace: production
  - policy: require-image-digest
    rule: disallow-latest-tag
    result: fail
    message: "container sidecar uses :latest"
    resource:
      kind: Deployment
      name: legacy-batch
      namespace: production

Evidence automation pipeline

  1. CI exports SARIF/JUnit from Conftest, Checkov, Kyverno CLI.
  2. Nightly job aggregates PolicyReport CRDs cluster-wide.
  3. Object-lock S3 bucket stores immutable monthly evidence bundles.
  4. GRC tool ingests via API—auditor queries, not spreadsheet hunts.
🔒 Security

Auditors trust automated continuous evidence over annual manual sampling. Export PolicyReport and Conftest artifacts to your GRC tool or S3 evidence bucket with object lock.

Framework quick reference

FrameworkPolicy focusAutomation priority
SOC 2 Type IIChange management, access controlGit PR + admission logs
PCI-DSS 4.0Network segmentation, encryptionTerraform OPA + Kyverno NetworkPolicy
HIPAAAudit trails, minimum necessaryCloudTrail + RBAC policies
ISO 27001Risk treatment, asset inventorySBOM + policy reports
NIST 800-53Configuration managementConftest + drift detection

Evidence retention

  1. CI artifacts: SARIF/JUnit 90 days minimum
  2. PolicyReport exports: monthly snapshots to WORM storage
  3. Admission webhook audit logs: 1 year hot, 7 years cold
  4. Exception tickets linked to PolicyException CRD metadata

DORA Metrics & Policy Automation

Elite DORA performers deploy frequently with low change failure rate because automation catches mistakes early—including policy violations. Policy gates add seconds to pipeline; they subtract hours from MTTR and audit prep.

How policy automation moves DORA metrics

DORA metricPolicy automation effect
Deployment frequencyFast CI policy feedback enables trunk-based merges without fear
Lead time for changesShift-left deny in PR vs 2am admission failure in prod
Change failure rateBlocks misconfigurations that cause outages (open SG, no limits)
MTTRKnown-good policy baseline + Git revert beats manual firefighting

Quick DORA tier check

Adjust inputs to see overall tier—policy automation targets elite change failure rate and MTTR.

Overall tier: Elite

All four metrics at elite tier—world-class delivery.

.github/workflows/policy-metrics.yml
name: Policy metrics export
on:
  schedule:
    - cron: '0 * * * *'
jobs:
  export:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Collect Kyverno policy reports
        run: |
          kubectl get policyreport -A -o json | jq '[.items[].results[] | select(.result=="fail")] | length' > fails.txt
      - name: Push metric
        run: |
          curl -X POST "https://api.datadoghq.com/api/v1/series" \
            -H "DD-API-KEY: ${{ secrets.DD_API_KEY }}" \
            -d '{"series":[{"metric":"kyverno.policy.violations","points":[['"$(date +%s)"',"$(cat fails.txt)"]],"type":"gauge"}]}'
.gitlab-ci.yml
policy-violation-count:
  stage: report
  script:
    - kubectl get policyreport -A -o json > reports.json
    - FAILS=$(jq '[.items[].results[] | select(.result=="fail")] | length' reports.json)
    - echo "KYVERNO_VIOLATIONS=$FAILS" >> report.env
  artifacts:
    reports:
      dotenv: report.env
🎯 Interview Tip

Explain how you reduced change failure rate: introduced Conftest on PR → Kyverno enforce in cluster → exported PolicyReport for compliance. Quantify: CFR dropped from 28% to 9% over two quarters.

⚖️ Trade-off

Over-strict policies block legitimate emergencies. Use validationFailureAction: Audit shadow mode first, then Enforce after fix window—with documented exception process.

Policy gate latency budget

Admission webhooks add to API server latency. Target <100ms p99 for validate rules; verifyImages may need 5–30s timeout—run heavy verification in CI and keep admission to digest/signature checks only.

Gate locationTypical added latencyBlocks
Pre-commit1–5sSecrets, format
PR Conftest10–30sIaC + manifest policy
Kyverno validate50–200msLabels, limits, tags
verifyImages1–10sUnsigned images

Building a policy platform roadmap

  1. Month 1: Conftest on PR for K8s manifests (warn mode)
  2. Month 2: Checkov on Terraform PR (block critical)
  3. Month 3: Kyverno audit for prod namespace violations
  4. Month 4: Enforce digest pin + disallow latest
  5. Month 5: verifyImages Cosign for production
  6. Month 6: Export PolicyReport to compliance dashboard
📦 Real World

Teams that jump straight to Enforce on day one get bypass culture—developers kubectl apply from laptops with cluster-admin. Shadow mode builds trust; enforce mode earns it.