Policy as Code

Why Policy as Code?

Manual compliance checklists do not survive 200 deploys per week. Policy as Code encodes organizational guardrails as executable rules—evaluated on every PR, Terraform plan, and Kubernetes admission request—before non-compliant resources reach production.

Policy evaluation points

Stage	Engine	Blocks
CI — Terraform plan	OPA / Conftest / Sentinel	Public S3, wildcard IAM
CI — Kubernetes manifest	Conftest / Kyverno CLI	Missing limits, latest tag
Admission — K8s API	Kyverno / Gatekeeper	Non-compliant live deploy
Runtime — service mesh	OPA Envoy plugin	Unauthorized east-west call

Policies should be owned by platform security, versioned in git, tested with fixture inputs, and exempted via documented break-glass annotations—not Slack DMs to disable gates.

flowchart LR
  Dev["Developer PR"] --> CI["CI policy scan"]
  CI -->|pass| Merge["Merge"]
  CI -->|fail| Fix["Fix or request exception"]
  Merge --> GitOps["GitOps sync"]
  GitOps --> Admit["K8s admission policy"]
  Admit -->|pass| Live["Production workload"]
  Admit -->|fail| Reject["HTTP 422 rejected"]

🔒 Security

Policy without metrics is blind compliance. Pair deny rules with dashboards: count of blocked deployments, top violating teams, mean time to policy fix.

Policy maturity model

Level	Behavior	Example
0 — Ad hoc	Manual checklist at release	Spreadsheet SOC2 controls
1 — CI warn	Scanners report, do not block	Checkov soft-fail
2 — CI enforce	PR blocked on violation	Conftest deny on plan
3 — Admission enforce	Cluster rejects non-compliant	Kyverno Enforce mode
4 — Continuous audit	Background scan + metrics	Gatekeeper audit + PolicyReport

Policy ownership RACI

Platform security — authors ClusterPolicy / ConstraintTemplate
App teams — fix violations in their namespaces
SRE — operates admission webhook HA and latency SLOs
Compliance — maps policies to framework controls

Open Policy Agent (OPA)

OPA is a general-purpose policy engine using Rego—a declarative query language over JSON. One OPA deployment evaluates Terraform plans, Kubernetes admission reviews, API authorization, and microservice sidecar policies.

package terraform.s3

import future.keywords.in

deny[msg] {
    some resource in input.resource_changes
    resource.type == "aws_s3_bucket"
    resource.change.after.acl == "public-read"
    msg := sprintf("S3 bucket %s must not be public-read", [resource.address])
}

deny[msg] {
    some resource in input.resource_changes
    resource.type == "aws_s3_bucket_public_access_block"
    not resource.change.after.block_public_acls
    msg := sprintf("S3 %s must block public ACLs", [resource.address])
}

package kubernetes.deployments

deny[msg] {
    input.kind == "Deployment"
    container := input.spec.template.spec.containers[_]
    not container.resources.limits.cpu
    msg := sprintf("container %s missing CPU limit", [container.name])
}

deny[msg] {
    input.kind == "Deployment"
    container := input.spec.template.spec.containers[_]
    not container.resources.limits.memory
    msg := sprintf("container %s missing memory limit", [container.name])
}

name: OPA Conftest
on: [pull_request]
jobs:
  conftest:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: go install github.com/open-policy-agent/conftest@latest
      - name: Scan K8s manifests
        run: conftest test -p policy/kubernetes deploy/ --all-namespaces
      - name: Scan Terraform plan
        run: |
          terraform show -json plan.tfplan > plan.json
          conftest test -p policy/terraform plan.json

conftest-k8s:
  stage: test
  image: openpolicyagent/conftest:latest
  script:
    - conftest test -p policy/kubernetes deploy/ --all-namespaces
  rules:
    - changes:
        - deploy/**/*
        - policy/**/*

conftest-tf:
  stage: test
  script:
    - terraform show -json plan.tfplan > plan.json
    - conftest test -p policy/terraform plan.json
  needs: [terraform-plan]

🔬 Under the Hood

OPA evaluates input against rules that define deny sets. If any deny rule matches, decision is reject. Test with opa test policy/ -v.

OPA deployment patterns

Pattern	Use case	Latency
Sidecar	Envoy ext_authz per pod	Low — local
Centralized service	API authorization	Medium — network hop
Bundle + hot reload	K8s admission via Gatekeeper	Low — in-process
CI-only (Conftest)	Shift-left without runtime OPA	N/A in prod

Testing Rego

Every policy package needs *_test.rego with table-driven cases. Run opa test -v policy/ in CI on every policy PR—Rego bugs are logic bugs, not syntax typos.

package kubernetes.deployments

test_deny_missing_cpu {
    input := {
        "kind": "Deployment",
        "spec": {
            "template": {
                "spec": {
                    "containers": [{
                        "name": "api",
                        "resources": {"limits": {"memory": "512Mi"}}
                    }]
                }
            }
        }
    }
    count(deny) == 1 with input as input
}

test_allow_with_limits {
    input := {
        "kind": "Deployment",
        "spec": {
            "template": {
                "spec": {
                    "containers": [{
                        "name": "api",
                        "resources": {
                            "limits": {"cpu": "500m", "memory": "512Mi"}
                        }
                    }]
                }
            }
        }
    }
    count(deny) == 0 with input as input
}

OPA Gatekeeper on Kubernetes

Gatekeeper is Kubernetes-native OPA—ConstraintTemplate CRDs define Rego; Constraint CRs parameterize and enable it. Validating admission webhook rejects non-compliant resources at API server.

apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8srequiredlabels
spec:
  crd:
    spec:
      names:
        kind: K8sRequiredLabels
      validation:
        openAPIV3Schema:
          type: object
          properties:
            labels:
              type: array
              items:
                type: string
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8srequiredlabels
        violation[{"msg": msg}] {
          required := input.parameters.labels
          provided := input.review.object.metadata.labels
          missing := required[_]
          not provided[missing]
          msg := sprintf("missing required label: %v", [missing])
        }

apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
  name: platform-mandatory-labels
spec:
  match:
    kinds:
      - apiGroups: ["apps"]
        kinds: ["Deployment", "StatefulSet"]
    namespaces:
      - production
  parameters:
    labels:
      - app.kubernetes.io/name
      - app.kubernetes.io/version
      - owner.team

$ kubectl get constraints
$ kubectl get k8srequiredlabels.constraints.gatekeeper.sh platform-mandatory-labels -o yaml
→ status.totalViolations: 3
$ kubectl logs -n gatekeeper-system deploy/gatekeeper-audit$ oc get constraints
$ oc describe k8srequiredlabels platform-mandatory-labels

Gatekeeper vs Kyverno

Aspect	Gatekeeper	Kyverno
Policy language	Rego	YAML patterns + CEL
Mutate	Limited via mutation CRDs	Native mutate rules
Image verify	Via external data	Built-in verifyImages
Learning curve	Steeper	K8s-native

⚖️ Trade-off

Gatekeeper Rego is powerful but steep for app teams. Platform owns templates; product teams only supply Constraint parameters—or migrate simple rules to Kyverno YAML.

Audit vs enforce

Start new constraints in audit mode—violations logged without blocking deploys. Gatekeeper audit controller writes violations to status; export to SIEM before flipping to deny.

Mode	Effect	When
Audit	Log violation, allow create	Policy rollout week 1–2
Dryrun	Webhook simulates without persist	Testing template changes
Enforce	HTTP 422 reject	After fix window + comms

Kyverno Policies

Kyverno policies are Kubernetes resources—no Rego required. Validate, mutate, generate, and cleanup rules use YAML patterns familiar to platform engineers.

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-image-digest
spec:
  validationFailureAction: Enforce
  background: true
  rules:
    - name: disallow-latest-tag
      match:
        any:
          - resources:
              kinds:
                - Pod
      validate:
        message: "Using 'latest' tag is not allowed."
        pattern:
          spec:
            containers:
              - image: "!*:latest"
    - name: require-digest
      match:
        any:
          - resources:
              kinds:
                - Pod
      validate:
        message: "Images must use digest pinning (@sha256:...)."
        pattern:
          spec:
            containers:
              - image: "*@sha256:*"

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: verify-image-signatures
spec:
  validationFailureAction: Enforce
  webhookTimeoutSeconds: 30
  rules:
    - name: verify-signature
      match:
        any:
          - resources:
              kinds:
                - Pod
      verifyImages:
        - imageReferences:
            - "registry.example.com/*"
          attestors:
            - count: 1
              entries:
                - keys:
                    publicKeys: |-
                      -----BEGIN PUBLIC KEY-----
                      MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE...
                      -----END PUBLIC KEY-----

name: Kyverno policy test
on:
  pull_request:
    paths: ['policy/kyverno/**', 'deploy/**']
jobs:
  kyverno:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: kyverno/[email protected]
      - run: kyverno apply policy/kyverno/ --resource deploy/ --policy-report
      - run: kyverno test policy/kyverno/tests/

kyverno-test:
  stage: test
  image: ghcr.io/kyverno/kyverno-cli:v1.12.0
  script:
    - kyverno apply policy/kyverno/ --resource deploy/ --policy-report
    - kyverno test policy/kyverno/tests/
  rules:
    - changes:
        - policy/kyverno/**/*
        - deploy/**/*

💡 Pro Tip

Run kyverno apply in CI against rendered manifests (Kustomize/Helm output)—catch violations before Argo CD sync surfaces opaque admission errors.

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: add-default-resources
spec:
  rules:
    - name: default-requests-limits
      match:
        any:
          - resources:
              kinds: [Deployment]
      mutate:
        patchStrategicMerge:
          spec:
            template:
              spec:
                containers:
                  - (name): "*"
                    resources:
                      requests:
                        +(cpu): "100m"
                        +(memory): "128Mi"
                      limits:
                        +(memory): "512Mi"

Policy exceptions

Kyverno PolicyException grants time-bound waivers for break-glass deploys— requires security approval annotation and expiry. Permanent exceptions belong in policy revision, not exceptions CRD.

Performance considerations

verifyImages calls registry—set reasonable webhookTimeoutSeconds
Background scans run async—do not rely on them for deploy-time safety
Keep match rules narrow—cluster-wide Pod matches are expensive at scale

Conftest & CI Policy Gates

Conftest wraps OPA for CI-friendly testing of JSON/YAML/HCL plans. Write policies once in policy/; test with fixture files; fail builds before merge.

Policy repository layout

policy/
├── kubernetes/
│   ├── deployments.rego
│   └── tests/
│       └── missing_limits_test.rego
├── terraform/
│   ├── s3.rego
│   └── tests/
└── docker/
    └── dockerfile.rego

Testing policies

Each Rego package should have *_test.rego with positive and negative cases. CI runs conftest test and opa test on every policy change—policies are code and deserve the same review rigor as application logic.

package main

deny[msg] {
  input.kind == "Deployment"
  container := input.spec.template.spec.containers[_]
  endswith(container.image, ":latest")
  msg := sprintf("container %v uses :latest tag", [container.name])
}

name: Conftest SARIF
on: [pull_request]
jobs:
  policy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: go install github.com/open-policy-agent/conftest@latest
      - name: Test policies
        run: conftest test -p policy/kubernetes deploy/ -o sarif > conftest.sarif
      - uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: conftest.sarif

conftest:
  stage: test
  image: openpolicyagent/conftest:latest
  script:
    - conftest test -p policy/kubernetes deploy/ -o junit > conftest.xml
  artifacts:
    reports:
      junit: conftest.xml

📦 Real World

Upload Conftest SARIF to GitHub/GitLab security dashboards—security champions see policy failures alongside SAST/SCA in one PR view.

Compliance Frameworks & Evidence

SOC 2, PCI-DSS, HIPAA, and ISO 27001 ask for demonstrable controls—not screenshots. Policy-as-code generates continuous evidence: every denied public bucket, every enforced digest pin, every audit log entry is a compliance datapoint.

Control mapping example

Framework control	Automated policy	Evidence source
SOC 2 CC6.1 — logical access	Kyverno: require IRSA annotation	PolicyReport CRD export
PCI 1.2 — network segmentation	Terraform OPA: private subnets only	Conftest plan output in CI artifacts
PCI 3.4 — encryption at rest	Checkov CKV_AWS_19	Checkov JUnit report
HIPAA — audit logging	Terraform: CloudTrail required	Terraform state + AWS Config

apiVersion: wgpolicyk8s.io/v1alpha2
kind: PolicyReport
metadata:
  name: ns-production-audit
  namespace: production
results:
  - policy: require-image-digest
    rule: require-digest
    result: pass
    scored: true
    resource:
      kind: Deployment
      name: payments-api
      namespace: production
  - policy: require-image-digest
    rule: disallow-latest-tag
    result: fail
    message: "container sidecar uses :latest"
    resource:
      kind: Deployment
      name: legacy-batch
      namespace: production

Evidence automation pipeline

CI exports SARIF/JUnit from Conftest, Checkov, Kyverno CLI.
Nightly job aggregates PolicyReport CRDs cluster-wide.
Object-lock S3 bucket stores immutable monthly evidence bundles.
GRC tool ingests via API—auditor queries, not spreadsheet hunts.

🔒 Security

Auditors trust automated continuous evidence over annual manual sampling. Export PolicyReport and Conftest artifacts to your GRC tool or S3 evidence bucket with object lock.

Framework quick reference

Framework	Policy focus	Automation priority
SOC 2 Type II	Change management, access control	Git PR + admission logs
PCI-DSS 4.0	Network segmentation, encryption	Terraform OPA + Kyverno NetworkPolicy
HIPAA	Audit trails, minimum necessary	CloudTrail + RBAC policies
ISO 27001	Risk treatment, asset inventory	SBOM + policy reports
NIST 800-53	Configuration management	Conftest + drift detection

Evidence retention

CI artifacts: SARIF/JUnit 90 days minimum
PolicyReport exports: monthly snapshots to WORM storage
Admission webhook audit logs: 1 year hot, 7 years cold
Exception tickets linked to PolicyException CRD metadata

DORA Metrics & Policy Automation

Elite DORA performers deploy frequently with low change failure rate because automation catches mistakes early—including policy violations. Policy gates add seconds to pipeline; they subtract hours from MTTR and audit prep.

How policy automation moves DORA metrics

DORA metric	Policy automation effect
Deployment frequency	Fast CI policy feedback enables trunk-based merges without fear
Lead time for changes	Shift-left deny in PR vs 2am admission failure in prod
Change failure rate	Blocks misconfigurations that cause outages (open SG, no limits)
MTTR	Known-good policy baseline + Git revert beats manual firefighting

Quick DORA tier check

Adjust inputs to see overall tier—policy automation targets elite change failure rate and MTTR.

Deploy frequency Lead time Change fail rate MTTR

Overall tier: Elite

All four metrics at elite tier—world-class delivery.

name: Policy metrics export
on:
  schedule:
    - cron: '0 * * * *'
jobs:
  export:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Collect Kyverno policy reports
        run: |
          kubectl get policyreport -A -o json | jq '[.items[].results[] | select(.result=="fail")] | length' > fails.txt
      - name: Push metric
        run: |
          curl -X POST "https://api.datadoghq.com/api/v1/series" \
            -H "DD-API-KEY: ${{ secrets.DD_API_KEY }}" \
            -d '{"series":[{"metric":"kyverno.policy.violations","points":[['"$(date +%s)"',"$(cat fails.txt)"]],"type":"gauge"}]}'

policy-violation-count:
  stage: report
  script:
    - kubectl get policyreport -A -o json > reports.json
    - FAILS=$(jq '[.items[].results[] | select(.result=="fail")] | length' reports.json)
    - echo "KYVERNO_VIOLATIONS=$FAILS" >> report.env
  artifacts:
    reports:
      dotenv: report.env

🎯 Interview Tip

Explain how you reduced change failure rate: introduced Conftest on PR → Kyverno enforce in cluster → exported PolicyReport for compliance. Quantify: CFR dropped from 28% to 9% over two quarters.

⚖️ Trade-off

Over-strict policies block legitimate emergencies. Use validationFailureAction: Audit shadow mode first, then Enforce after fix window—with documented exception process.

Policy gate latency budget

Admission webhooks add to API server latency. Target <100ms p99 for validate rules; verifyImages may need 5–30s timeout—run heavy verification in CI and keep admission to digest/signature checks only.

Gate location	Typical added latency	Blocks
Pre-commit	1–5s	Secrets, format
PR Conftest	10–30s	IaC + manifest policy
Kyverno validate	50–200ms	Labels, limits, tags
verifyImages	1–10s	Unsigned images

Building a policy platform roadmap

Month 1: Conftest on PR for K8s manifests (warn mode)
Month 2: Checkov on Terraform PR (block critical)
Month 3: Kyverno audit for prod namespace violations
Month 4: Enforce digest pin + disallow latest
Month 5: verifyImages Cosign for production
Month 6: Export PolicyReport to compliance dashboard

📦 Real World

Teams that jump straight to Enforce on day one get bypass culture—developers kubectl apply from laptops with cluster-admin. Shadow mode builds trust; enforce mode earns it.