VPC & Networking

Every AWS resource lives inside a VPC — even "serverless" Lambda functions run in a VPC when they need to reach private databases. Get networking wrong and you get public RDS endpoints, NAT bills that dwarf compute spend, or security groups that accidentally allow the entire internet to your app tier. This chapter covers how to design a production three-tier VPC, route traffic safely between tiers and AZs, and keep AWS API calls off expensive NAT paths.

developer devops architect Regional service NAT Gateway costs apply

VPC architecture & CIDR planning

A VPC is your private network boundary in AWS. Subnets carve that space into tiers; route tables decide where packets go. Plan CIDR blocks before you deploy anything — renumbering a live VPC is painful.

VPC, subnets, and CIDR blocks

A VPC is a logically isolated network in a single AWS region. You assign one or more IPv4 CIDR blocks (e.g. 10.0.0.0/16) and optionally IPv6. Subnets are subdivisions of the VPC CIDR, each mapped to exactly one Availability Zone. Every ENI (EC2, RDS, Lambda in VPC, ALB nodes) gets an IP from its subnet's range.

RFC 1918 private ranges are standard: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16. Avoid overlapping CIDRs with on-premises networks or other VPCs you will peer — overlapping ranges block peering.

CIDR sizing rules of thumb

Subnet tier Typical size Why
Public /24 per AZ (251 usable) ALB, NAT Gateway, bastion — few IPs, but room for future public-facing services
Private (app) /20 or larger per AZ ECS tasks, EKS nodes, Lambda ENIs scale with pod/task count — undersizing causes IP exhaustion
Isolated (data) /24 per AZ RDS, ElastiCache, OpenSearch — one primary + replicas per AZ; no internet route needed
VPC overall /16 minimum for multi-AZ prod Leaves headroom for TGW attachments, future micro-segmentation, and secondary CIDR blocks
⚠️ Pitfall

AWS reserves 5 IP addresses per subnet — not available for your ENIs. For a /24 (256 addresses), you get 251 usable IPs. The reserved addresses are: network address, VPC router, DNS server, future use, and broadcast. Always subtract 5 when calculating capacity; EKS with prefix delegation can consume IPs faster than expected.

Three-tier network pattern

Production web apps use a three-tier layout across at least two AZs for high availability:

  1. Public tier — Application Load Balancer (ALB) faces the internet via an Internet Gateway (IGW). Only load balancers and NAT Gateways live here; never place application servers directly in public subnets.
  2. Private app tier — ECS/Fargate tasks, EC2 app servers, Lambda functions (when VPC-attached). Outbound internet (package updates, third-party APIs) routes through NAT Gateway in the public tier. Inbound user traffic arrives only via ALB.
  3. Isolated data tier — RDS, Aurora, ElastiCache, Amazon MQ. No route to IGW or NAT — only app-tier security groups can reach database ports.
flowchart TB
  subgraph Internet["Internet"]
    USERS["Users / Clients"]
  end

  subgraph VPC["VPC 10.0.0.0/16"]
    IGW["Internet Gateway"]

    subgraph AZa["Availability Zone A"]
      PUBa["Public Subnet 10.0.1.0/24\nALB + NAT GW"]
      APPa["Private App 10.0.10.0/20\nECS / Lambda"]
      DBa["Isolated DB 10.0.100.0/24\nRDS Primary"]
    end

    subgraph AZb["Availability Zone B"]
      PUBb["Public Subnet 10.0.2.0/24\nALB + NAT GW"]
      APPb["Private App 10.0.20.0/20\nECS / Lambda"]
      DBb["Isolated DB 10.0.101.0/24\nRDS Standby"]
    end
  end

  USERS --> IGW
  IGW --> PUBa
  IGW --> PUBb
  PUBa --> APPa
  PUBb --> APPb
  APPa --> DBa
  APPb --> DBb
  APPa -.->|"outbound via NAT"| PUBa
  APPb -.->|"outbound via NAT"| PUBb

Route table design

Subnet type Default route Associated resources
Public 0.0.0.0/0 → igw-xxx ALB, NAT Gateway, optional bastion (prefer SSM Session Manager instead)
Private app 0.0.0.0/0 → nat-xxx (same AZ) ECS tasks, EKS nodes, Lambda ENIs, internal NLBs
Isolated data No default route to internet RDS, ElastiCache — local VPC routes only
🔬 Under the Hood

The VPC router is a managed, highly available component — you never see it, but every subnet's route table attaches to it. The DNS server at the VPC CIDR base + 2 (e.g. 10.0.0.2) resolves AWS internal hostnames like ec2.internal and enables DNS hostnames for your instances when enableDnsHostnames is true. ALB health checks use this DNS resolution internally.

Create a three-tier VPC

saved globally
bash
# Create VPC with DNS support
VPC_ID=$(aws ec2 create-vpc --cidr-block 10.0.0.0/16 \
  --tag-specifications 'ResourceType=vpc,Tags=[{Key=Name,Value=prod-vpc}]' \
  --query Vpc.VpcId --output text)

aws ec2 modify-vpc-attribute --vpc-id "$VPC_ID" --enable-dns-hostnames
aws ec2 modify-vpc-attribute --vpc-id "$VPC_ID" --enable-dns-support

IGW_ID=$(aws ec2 create-internet-gateway --query InternetGateway.InternetGatewayId --output text)
aws ec2 attach-internet-gateway --internet-gateway-id "$IGW_ID" --vpc-id "$VPC_ID"

# Public subnet AZ-a + route to IGW
PUB_A=$(aws ec2 create-subnet --vpc-id "$VPC_ID" --cidr-block 10.0.1.0/24 \
  --availability-zone eu-west-1a --query Subnet.SubnetId --output text)
RT_PUB=$(aws ec2 create-route-table --vpc-id "$VPC_ID" --query RouteTable.RouteTableId --output text)
aws ec2 create-route --route-table-id "$RT_PUB" --destination-cidr-block 0.0.0.0/0 --gateway-id "$IGW_ID"
aws ec2 associate-route-table --route-table-id "$RT_PUB" --subnet-id "$PUB_A"

# Private app subnet AZ-a + NAT (repeat per AZ for HA)
PRIV_A=$(aws ec2 create-subnet --vpc-id "$VPC_ID" --cidr-block 10.0.10.0/20 \
  --availability-zone eu-west-1a --query Subnet.SubnetId --output text)
EIP=$(aws ec2 allocate-address --domain vpc --query AllocationId --output text)
NAT_A=$(aws ec2 create-nat-gateway --subnet-id "$PUB_A" --allocation-id "$EIP" \
  --query NatGateway.NatGatewayId --output text)
RT_PRIV=$(aws ec2 create-route-table --vpc-id "$VPC_ID" --query RouteTable.RouteTableId --output text)
aws ec2 create-route --route-table-id "$RT_PRIV" --destination-cidr-block 0.0.0.0/0 --nat-gateway-id "$NAT_A"
aws ec2 associate-route-table --route-table-id "$RT_PRIV" --subnet-id "$PRIV_A"

# Isolated DB subnet (no 0.0.0.0/0 route)
DB_A=$(aws ec2 create-subnet --vpc-id "$VPC_ID" --cidr-block 10.0.100.0/24 \
  --availability-zone eu-west-1a --query Subnet.SubnetId --output text)
hcl
module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 5.0"

  name = "prod-vpc"
  cidr = "10.0.0.0/16"
  azs  = ["eu-west-1a", "eu-west-1b"]

  public_subnets   = ["10.0.1.0/24",  "10.0.2.0/24"]
  private_subnets  = ["10.0.10.0/20", "10.0.20.0/20"]
  database_subnets = ["10.0.100.0/24", "10.0.101.0/24"]

  enable_nat_gateway     = true
  single_nat_gateway     = false   # one NAT per AZ for HA
  one_nat_gateway_per_az = true

  enable_dns_hostnames = true
  enable_dns_support   = true

  create_database_subnet_route_table = true
  # database subnets have no internet route by default in this module
}

output "vpc_id" { value = module.vpc.vpc_id }
output "private_subnets" { value = module.vpc.private_subnets }
typescript
import * as ec2 from 'aws-cdk-lib/aws-ec2';

const vpc = new ec2.Vpc(this, 'ProdVpc', {
  ipAddresses: ec2.IpAddresses.cidr('10.0.0.0/16'),
  maxAzs: 2,
  natGateways: 2, // one per AZ
  subnetConfiguration: [
    { name: 'Public',   subnetType: ec2.SubnetType.PUBLIC,       cidrMask: 24 },
    { name: 'App',      subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS, cidrMask: 20 },
    { name: 'Database', subnetType: ec2.SubnetType.PRIVATE_ISOLATED,    cidrMask: 24 },
  ],
});

// ECS service in app tier
new ecs.FargateService(this, 'ApiService', {
  cluster,
  taskDefinition,
  vpcSubnets: { subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS },
  vpc,
});
🎯 Exam Tip

When the exam asks for highly available outbound internet from private subnets, the answer is one NAT Gateway per AZ (not one shared NAT). When it asks where to place an ALB for internet-facing traffic, answer public subnets in at least two AZs. RDS in public subnets is almost never the right answer — use isolated/database subnets with security group restrictions.

Internet Gateway, NAT Gateway & outbound routing

Public subnets reach the internet directly; private subnets need NAT for outbound-only access. NAT Gateway is managed and expensive; NAT instances are legacy. Know the cost model before you deploy.

Internet Gateway vs NAT

Component Direction Subnet Notes
Internet Gateway (IGW) Bidirectional Public only Resources need public IP or Elastic IP; attached at VPC level
NAT Gateway Outbound only Public subnet; serves private subnets Managed, scales to 45 Gbps, AZ-scoped — if NAT AZ fails, that AZ's private subnets lose outbound
NAT instance Outbound only Public subnet (self-managed EC2) Legacy — you patch, scale, and HA it yourself; use NAT Gateway instead
Egress-Only IGW IPv6 outbound only IPv6-enabled subnets IPv6 addresses are public by default; Egress-Only IGW blocks inbound initiation

NAT Gateway pricing (us-east-1, check your region): $0.045/hour per NAT Gateway (~$32.85/month per AZ) plus $0.045/GB processed. A single NAT in one AZ is cheaper but creates a cross-AZ failure domain and cross-AZ data charges when private subnets in AZ-b route through NAT in AZ-a.

💰 NAT Gateway cost calculator

Hourly charge + data processing. Gateway endpoints for S3/DynamoDB bypass NAT (free).

Hourly (existence) $65.70
Data processing $13.50
Estimated total $79.20/mo
Saved via endpoints $9.00/mo

          
💰 Cost

A three-AZ VPC with three NAT Gateways processing 2 TB/month can exceed $500/month on NAT alone — before compute. Mitigations: gateway endpoints for S3/DynamoDB (free), interface endpoints only where needed, VPC endpoints for ECR API/DKR to pull images without NAT, and consider enableDnsSupport + private hosted zones to keep traffic internal.

High availability pattern

Deploy one NAT Gateway per AZ in the public subnet of each AZ. Private subnet route tables in AZ-a point to NAT in AZ-a — never cross-AZ NAT routing in production. If cost is the constraint in dev/staging, a single NAT Gateway is acceptable with the understanding that AZ failure takes outbound internet with it for all private subnets.

⚖️ Trade-off

One NAT vs NAT per AZ: single NAT saves ~$66/month per omitted AZ but creates cross-AZ data transfer charges (~$0.01/GB each direction) and a single point of failure. Production workloads choose NAT per AZ; dev sandboxes often use one NAT or NAT-less setups with VPC endpoints only.

IPv6 and Egress-Only Internet Gateway

IPv6 addresses assigned to instances are globally routable. An Egress-Only Internet Gateway allows outbound IPv6 (software updates, external APIs) while blocking inbound connections initiated from the internet — the IPv6 equivalent of NAT for outbound-only, without address translation. Pair with security groups for fine-grained control.

💡 Pro Tip

Run aws ec2 describe-nat-gateways and check VPC Flow Logs filtered on dstPort=443 to find NAT-heavy workloads. Often 30–60% of NAT traffic is S3/ECR/DynamoDB that gateway endpoints eliminate for free. Enable flow logs to an S3 bucket with Athena queries for ongoing cost attribution per subnet.

Security groups & network ACLs

Security groups are your primary firewall — stateful, attached to ENIs, and referenced by ID. NACLs are a secondary, stateless subnet-level filter. Most teams rely almost entirely on security groups.

Security groups: stateful firewall

A security group (SG) acts as a virtual firewall for ENIs. Rules are allow-only (no deny rules). Stateful means if you allow inbound TCP 443, the return traffic is automatically permitted — you don't need an outbound rule for the response. Default: all outbound allowed, all inbound denied.

ALB → App → DB reference pattern

Production three-tier apps chain security groups by reference instead of CIDR:

  1. ALB SG — inbound 443 from 0.0.0.0/0 (or CloudFront prefix list); outbound to App SG on app port
  2. App SG — inbound app port only from ALB SG; outbound to DB SG on 5432 (PostgreSQL) or 3306 (MySQL)
  3. DB SG — inbound database port only from App SG; no outbound to internet
json
{
  "SecurityGroupRules": [
    {
      "Description": "App tier accepts traffic only from ALB",
      "IpProtocol": "tcp",
      "FromPort": 8080,
      "ToPort": 8080,
      "ReferencedGroupId": "sg-alb123456"
    }
  ]
}

Network ACLs: stateless subnet filter

NACLs apply at the subnet boundary. They are stateless — allow inbound AND allow outbound (ephemeral ports) must both be configured explicitly. Rules are numbered (lowest first) and include explicit deny. Default NACL allows all; custom NACLs deny all until you add rules.

Feature Security Group Network ACL
Scope ENI / instance level Subnet level
State Stateful — return traffic auto-allowed Stateless — must allow both directions
Rules Allow only; evaluated all rules Allow and deny; numbered order, first match wins
Default Deny inbound, allow outbound Default NACL: allow all; custom NACL: deny all
Primary use Application firewall — 99% of your rules Subnet-level guardrails, blocking IP ranges, DMZ separation
Reference by SG Yes — ReferencedGroupId No — CIDR blocks only
🔒 Security

Never open database security groups to 0.0.0.0/0. Use SG-to-SG references so IP changes (autoscaling, Fargate task replacement) don't break connectivity. Add a NACL deny rule for known malicious CIDRs as defense-in-depth, but don't rely on NACLs for application-level auth — that's what security groups and IAM are for.

⚠️ Pitfall

Changing a NACL rule affects every instance in the subnet immediately. A misnumbered deny rule can take down an entire AZ. Security group changes are safer — test NACL edits in non-prod first. Common mistake: allowing inbound TCP 443 on a NACL but forgetting outbound ephemeral ports (1024-65535) for return traffic.

📦 Real World

Stripe and most fintech platforms use security group referencing exclusively for tier-to-tier traffic — NACLs stay at default allow with optional explicit denies for compliance subnets. Shopify micro-segments services with one SG per service type, referenced by upstream load balancers and service mesh sidecars rather than CIDR-based rules.

VPC endpoints — keep traffic private

VPC endpoints let resources in private subnets reach AWS services without traversing the public internet or NAT Gateway. Gateway endpoints are free; interface endpoints cost per AZ but unlock PrivateLink patterns.

Gateway vs interface endpoints

Type Services How it works Cost
Gateway S3, DynamoDB Route table entry (pl-xxx prefix list) — no ENI Free
Interface (PrivateLink) Most other AWS services + partner SaaS ENI with private IP in your subnet; DNS resolves to endpoint ~$0.01/hr/AZ + $0.01/GB processed

Endpoint policies

Both gateway and interface endpoints support endpoint policies — IAM-style JSON that restricts which API actions and resources can be accessed through the endpoint. Default policy allows full access; tighten to specific buckets, tables, or accounts for defense in depth:

json
{
  "Statement": [{
    "Effect": "Allow",
    "Principal": "*",
    "Action": ["s3:GetObject", "s3:PutObject"],
    "Resource": "arn:aws:s3:::my-app-artifacts/*",
    "Condition": {
      "StringEquals": { "aws:PrincipalAccount": "123456789012" }
    }
  }]
}

PrivateLink

AWS PrivateLink (interface endpoints) also exposes your own services to consumers in other VPCs or accounts via endpoint services (NLB-backed). Consumers create an interface endpoint in their VPC; traffic stays on the AWS network. Use for shared platform APIs, SaaS integrations, and cross-account service consumption without VPC peering.

Gateway endpoint for S3

saved globally
bash
# Create S3 gateway endpoint and attach to private route tables
VPCE_ID=$(aws ec2 create-vpc-endpoint \
  --vpc-id vpc-0abc123 \
  --service-name com.amazonaws.eu-west-1.s3 \
  --route-table-ids rtb-priv-a rtb-priv-b \
  --query VpcEndpoint.VpcEndpointId --output text)

# Optional: restrict with endpoint policy
aws ec2 modify-vpc-endpoint --vpc-endpoint-id "$VPCE_ID" \
  --policy-document file://s3-endpoint-policy.json
hcl
resource "aws_vpc_endpoint" "s3" {
  vpc_id       = module.vpc.vpc_id
  service_name = "com.amazonaws.${var.region}.s3"
  route_table_ids = concat(
    module.vpc.private_route_table_ids,
    module.vpc.database_route_table_ids,
  )

  policy = jsonencode({
    Statement = [{
      Effect    = "Allow"
      Principal = "*"
      Action    = ["s3:GetObject", "s3:PutObject", "s3:ListBucket"]
      Resource  = [
        aws_s3_bucket.artifacts.arn,
        "${aws_s3_bucket.artifacts.arn}/*",
      ]
    }]
  })
}

resource "aws_vpc_endpoint" "ecr_api" {
  vpc_id              = module.vpc.vpc_id
  service_name        = "com.amazonaws.${var.region}.ecr.api"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = module.vpc.private_subnets
  security_group_ids  = [aws_security_group.vpce.id]
  private_dns_enabled = true
}
typescript
import * as ec2 from 'aws-cdk-lib/aws-ec2';

// Gateway endpoint — free, route table based
vpc.addGatewayEndpoint('S3Endpoint', {
  service: ec2.GatewayVpcEndpointAwsService.S3,
  subnets: [{ subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS }],
});

// Interface endpoint — ECR pull without NAT
vpc.addInterfaceEndpoint('EcrApi', {
  service: ec2.InterfaceVpcEndpointAwsService.ECR,
  subnets: { subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS },
});
vpc.addInterfaceEndpoint('EcrDkr', {
  service: ec2.InterfaceVpcEndpointAwsService.ECR_DOCKER,
  subnets: { subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS },
});
🎯 Exam Tip

When the exam asks how to access S3 from a private subnet without NAT, answer gateway VPC endpoint (route table association). For accessing your own NLB service from another VPC without peering, answer PrivateLink / interface endpoint. Gateway endpoints do not use security groups; interface endpoints do.

💡 Pro Tip

Minimum VPC endpoint set for NAT-less private ECS/EKS: S3 gateway (free), ECR API + ECR DKR + CloudWatch Logs interface endpoints, and SSM interface endpoints if you use Session Manager instead of bastion hosts. Add Secrets Manager endpoint if tasks fetch secrets at startup — otherwise those calls go through NAT.

VPC peering & Transit Gateway

Connecting VPCs requires understanding transitivity limits. VPC peering is simple and non-transitive; Transit Gateway scales to hub-and-spoke topologies across accounts and regions.

VPC peering

A VPC peering connection links two VPCs with private IP routing. Peering is non-transitive: if VPC-A peers with VPC-B and VPC-B peers with VPC-C, VPC-A cannot reach VPC-C through B. You must create a direct peering connection or use Transit Gateway.

  • Same or different accounts, same or different regions (inter-region peering supported)
  • CIDR blocks must not overlap
  • No transitive routing — no "hub" VPC that forwards for others
  • DNS resolution can be enabled to resolve private hosted zone names across peered VPCs
  • Update route tables on both sides manually

Transit Gateway (TGW)

Transit Gateway is a regional hub that connects VPCs, VPNs, and Direct Connect. Attachments propagate routes; you control route tables per attachment. Supports hub-and-spoke at scale — dozens of VPCs, cross-account via RAM (Resource Access Manager), and inter-region peering between TGWs.

Scenario VPC Peering Transit Gateway
Two VPCs, simple link ✅ Cheapest, lowest complexity Overkill
3+ VPCs mesh ❌ N×(N-1)/2 peerings — unmanageable ✅ Star topology via TGW
Shared services VPC ❌ Non-transitive — each spoke needs peering to shared ✅ One attachment per VPC to TGW
On-premises via VPN/DX ❌ Cannot attach VPN to peering ✅ Single VPN/DX attachment to TGW
Cross-account at scale Possible but manual per peering ✅ RAM sharing + route table isolation
Cost sensitivity No hourly charge; data transfer only ~$0.05/hr + $0.02/GB processed
flowchart TB
  TGW["Transit Gateway\n(hub)"]
  SHARED["Shared Services VPC\nDNS, logging, CI"]
  PROD["Production VPC"]
  STAGE["Staging VPC"]
  DEV["Development VPC"]
  VPN["Site-to-Site VPN\nOn-premises"]

  PROD --> TGW
  STAGE --> TGW
  DEV --> TGW
  SHARED --> TGW
  VPN --> TGW
⚖️ Trade-off

Peering mesh vs TGW: three VPCs need three peering connections; ten VPCs need 45 peerings. TGW adds ~$36/month base cost but simplifies routing and on-prem connectivity. For two VPCs in one account, peering wins. For an organization with shared-services, prod, staging, and data lake VPCs, TGW is standard.

🔬 Under the Hood

TGW maintains separate route tables per attachment domain — e.g. isolate prod VPCs from dev VPCs even though both attach to the same TGW. Route propagation can be automatic (VPC CIDR advertised) or static. Blackhole routes drop traffic silently — always verify with VPC Reachability Analyzer after TGW changes.

Route53 & DNS

Route53 is AWS's authoritative DNS. It routes users to endpoints globally, resolves private names inside your VPC, and bridges on-premises DNS with cloud resources via Route53 Resolver.

Routing policies

Policy Behavior Use case
Simple One record, one target (or multiple with same value) Single-region app, internal service names
Weighted Split traffic by weight (e.g. 90/10) Blue/green deployments, canary releases, gradual migration
Latency Route to lowest-latency region for the user Multi-region active-active APIs, global user base
Failover Primary + secondary; health check on primary DR — automatic failover when primary ALB fails health check
Geolocation Route by user's geographic location (country/continent) GDPR data residency, locale-specific content, regulatory boundaries
Geoproximity Route by geographic distance with bias controls Fine-tune traffic distribution across regions
Multivalue Return up to 8 healthy records (not true load balancing) Simple HA — client picks from healthy IPs

Private hosted zones

A private hosted zone is associated with one or more VPCs and resolves DNS names only inside those VPCs — e.g. api.internal.mycompany.com → internal ALB. Cross-VPC resolution works via peering/TGW with enableDnsHostnames and private zone association on each VPC. Split-horizon DNS uses the same domain publicly and privately with different records.

Route53 Resolver

For hybrid cloud, Route53 Resolver provides inbound and outbound endpoints:

  • Inbound endpoint — on-premises DNS forwards queries for cloud domains to Resolver ENIs in your VPC
  • Outbound endpoint — forward VPC queries for on-prem domains (e.g. corp.local) to on-premises DNS servers via forwarding rules
  • Resolver rules — conditional forwarding: match domain suffix, forward to target IPs
yaml
# CloudFormation / example failover record structure
Resources:
  PrimaryRecord:
    Type: AWS::Route53::RecordSet
    Properties:
      HostedZoneId: !Ref HostedZone
      Name: api.example.com
      Type: A
      SetIdentifier: primary
      Failover: PRIMARY
      AliasTarget:
        DNSName: !GetAtt PrimaryAlb.DNSName
        HostedZoneId: !GetAtt PrimaryAlb.CanonicalHostedZoneID
      HealthCheckId: !Ref PrimaryHealthCheck

  SecondaryRecord:
    Type: AWS::Route53::RecordSet
    Properties:
      HostedZoneId: !Ref HostedZone
      Name: api.example.com
      Type: A
      SetIdentifier: secondary
      Failover: SECONDARY
      AliasTarget:
        DNSName: !GetAtt SecondaryAlb.DNSName
        HostedZoneId: !GetAtt SecondaryAlb.CanonicalHostedZoneID
📦 Real World

Slack and Lyft use Route53 latency-based routing for global API endpoints, with failover records tied to Route53 health checks on regional ALBs. Internal microservices resolve via private hosted zones (*.service.consul or Cloud Map) while customer-facing domains use public hosted zones with weighted routing for deployments.

🎯 Exam Tip

Failover routing requires a health check on the primary record. Weighted routing does not require health checks (unhealthy targets still receive traffic unless you use alias with ELB). For "route users to the closest region," answer latency routing. For "block EU users from US-only data," answer geolocation with a default record for unmatched regions.

🔒 Security

Enable DNSSEC signing on public hosted zones for domains you control end-to-end. Use private hosted zones for internal service discovery — never expose internal ALB DNS names publicly. Restrict Route53 API access with IAM; use route53:ChangeResourceRecordSets scoped to specific hosted zone ARNs in CI/CD deploy roles.