VPC & Networking
Every AWS resource lives inside a VPC — even "serverless" Lambda functions run in a VPC when they need to reach private databases. Get networking wrong and you get public RDS endpoints, NAT bills that dwarf compute spend, or security groups that accidentally allow the entire internet to your app tier. This chapter covers how to design a production three-tier VPC, route traffic safely between tiers and AZs, and keep AWS API calls off expensive NAT paths.
VPC architecture & CIDR planning
A VPC is your private network boundary in AWS. Subnets carve that space into tiers; route tables decide where packets go. Plan CIDR blocks before you deploy anything — renumbering a live VPC is painful.
VPC, subnets, and CIDR blocks
A VPC is a logically isolated network in a single AWS region. You assign one or more IPv4 CIDR blocks (e.g. 10.0.0.0/16) and optionally IPv6. Subnets are subdivisions of the VPC CIDR, each mapped to exactly one Availability Zone. Every ENI (EC2, RDS, Lambda in VPC, ALB nodes) gets an IP from its subnet's range.
RFC 1918 private ranges are standard: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16. Avoid overlapping CIDRs with on-premises networks or other VPCs you will peer — overlapping ranges block peering.
CIDR sizing rules of thumb
| Subnet tier | Typical size | Why |
|---|---|---|
| Public | /24 per AZ (251 usable) | ALB, NAT Gateway, bastion — few IPs, but room for future public-facing services |
| Private (app) | /20 or larger per AZ | ECS tasks, EKS nodes, Lambda ENIs scale with pod/task count — undersizing causes IP exhaustion |
| Isolated (data) | /24 per AZ | RDS, ElastiCache, OpenSearch — one primary + replicas per AZ; no internet route needed |
| VPC overall | /16 minimum for multi-AZ prod | Leaves headroom for TGW attachments, future micro-segmentation, and secondary CIDR blocks |
AWS reserves 5 IP addresses per subnet — not available for your ENIs. For a /24 (256 addresses), you get 251 usable IPs. The reserved addresses are: network address, VPC router, DNS server, future use, and broadcast. Always subtract 5 when calculating capacity; EKS with prefix delegation can consume IPs faster than expected.
Three-tier network pattern
Production web apps use a three-tier layout across at least two AZs for high availability:
- Public tier — Application Load Balancer (ALB) faces the internet via an Internet Gateway (IGW). Only load balancers and NAT Gateways live here; never place application servers directly in public subnets.
- Private app tier — ECS/Fargate tasks, EC2 app servers, Lambda functions (when VPC-attached). Outbound internet (package updates, third-party APIs) routes through NAT Gateway in the public tier. Inbound user traffic arrives only via ALB.
- Isolated data tier — RDS, Aurora, ElastiCache, Amazon MQ. No route to IGW or NAT — only app-tier security groups can reach database ports.
flowchart TB
subgraph Internet["Internet"]
USERS["Users / Clients"]
end
subgraph VPC["VPC 10.0.0.0/16"]
IGW["Internet Gateway"]
subgraph AZa["Availability Zone A"]
PUBa["Public Subnet 10.0.1.0/24\nALB + NAT GW"]
APPa["Private App 10.0.10.0/20\nECS / Lambda"]
DBa["Isolated DB 10.0.100.0/24\nRDS Primary"]
end
subgraph AZb["Availability Zone B"]
PUBb["Public Subnet 10.0.2.0/24\nALB + NAT GW"]
APPb["Private App 10.0.20.0/20\nECS / Lambda"]
DBb["Isolated DB 10.0.101.0/24\nRDS Standby"]
end
end
USERS --> IGW
IGW --> PUBa
IGW --> PUBb
PUBa --> APPa
PUBb --> APPb
APPa --> DBa
APPb --> DBb
APPa -.->|"outbound via NAT"| PUBa
APPb -.->|"outbound via NAT"| PUBb
Route table design
| Subnet type | Default route | Associated resources |
|---|---|---|
| Public | 0.0.0.0/0 → igw-xxx | ALB, NAT Gateway, optional bastion (prefer SSM Session Manager instead) |
| Private app | 0.0.0.0/0 → nat-xxx (same AZ) | ECS tasks, EKS nodes, Lambda ENIs, internal NLBs |
| Isolated data | No default route to internet | RDS, ElastiCache — local VPC routes only |
The VPC router is a managed, highly available component — you never see it, but every subnet's route table attaches to it. The DNS server at the VPC CIDR base + 2 (e.g. 10.0.0.2) resolves AWS internal hostnames like ec2.internal and enables DNS hostnames for your instances when enableDnsHostnames is true. ALB health checks use this DNS resolution internally.
Create a three-tier VPC
# Create VPC with DNS support
VPC_ID=$(aws ec2 create-vpc --cidr-block 10.0.0.0/16 \
--tag-specifications 'ResourceType=vpc,Tags=[{Key=Name,Value=prod-vpc}]' \
--query Vpc.VpcId --output text)
aws ec2 modify-vpc-attribute --vpc-id "$VPC_ID" --enable-dns-hostnames
aws ec2 modify-vpc-attribute --vpc-id "$VPC_ID" --enable-dns-support
IGW_ID=$(aws ec2 create-internet-gateway --query InternetGateway.InternetGatewayId --output text)
aws ec2 attach-internet-gateway --internet-gateway-id "$IGW_ID" --vpc-id "$VPC_ID"
# Public subnet AZ-a + route to IGW
PUB_A=$(aws ec2 create-subnet --vpc-id "$VPC_ID" --cidr-block 10.0.1.0/24 \
--availability-zone eu-west-1a --query Subnet.SubnetId --output text)
RT_PUB=$(aws ec2 create-route-table --vpc-id "$VPC_ID" --query RouteTable.RouteTableId --output text)
aws ec2 create-route --route-table-id "$RT_PUB" --destination-cidr-block 0.0.0.0/0 --gateway-id "$IGW_ID"
aws ec2 associate-route-table --route-table-id "$RT_PUB" --subnet-id "$PUB_A"
# Private app subnet AZ-a + NAT (repeat per AZ for HA)
PRIV_A=$(aws ec2 create-subnet --vpc-id "$VPC_ID" --cidr-block 10.0.10.0/20 \
--availability-zone eu-west-1a --query Subnet.SubnetId --output text)
EIP=$(aws ec2 allocate-address --domain vpc --query AllocationId --output text)
NAT_A=$(aws ec2 create-nat-gateway --subnet-id "$PUB_A" --allocation-id "$EIP" \
--query NatGateway.NatGatewayId --output text)
RT_PRIV=$(aws ec2 create-route-table --vpc-id "$VPC_ID" --query RouteTable.RouteTableId --output text)
aws ec2 create-route --route-table-id "$RT_PRIV" --destination-cidr-block 0.0.0.0/0 --nat-gateway-id "$NAT_A"
aws ec2 associate-route-table --route-table-id "$RT_PRIV" --subnet-id "$PRIV_A"
# Isolated DB subnet (no 0.0.0.0/0 route)
DB_A=$(aws ec2 create-subnet --vpc-id "$VPC_ID" --cidr-block 10.0.100.0/24 \
--availability-zone eu-west-1a --query Subnet.SubnetId --output text)
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 5.0"
name = "prod-vpc"
cidr = "10.0.0.0/16"
azs = ["eu-west-1a", "eu-west-1b"]
public_subnets = ["10.0.1.0/24", "10.0.2.0/24"]
private_subnets = ["10.0.10.0/20", "10.0.20.0/20"]
database_subnets = ["10.0.100.0/24", "10.0.101.0/24"]
enable_nat_gateway = true
single_nat_gateway = false # one NAT per AZ for HA
one_nat_gateway_per_az = true
enable_dns_hostnames = true
enable_dns_support = true
create_database_subnet_route_table = true
# database subnets have no internet route by default in this module
}
output "vpc_id" { value = module.vpc.vpc_id }
output "private_subnets" { value = module.vpc.private_subnets }
import * as ec2 from 'aws-cdk-lib/aws-ec2';
const vpc = new ec2.Vpc(this, 'ProdVpc', {
ipAddresses: ec2.IpAddresses.cidr('10.0.0.0/16'),
maxAzs: 2,
natGateways: 2, // one per AZ
subnetConfiguration: [
{ name: 'Public', subnetType: ec2.SubnetType.PUBLIC, cidrMask: 24 },
{ name: 'App', subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS, cidrMask: 20 },
{ name: 'Database', subnetType: ec2.SubnetType.PRIVATE_ISOLATED, cidrMask: 24 },
],
});
// ECS service in app tier
new ecs.FargateService(this, 'ApiService', {
cluster,
taskDefinition,
vpcSubnets: { subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS },
vpc,
});
When the exam asks for highly available outbound internet from private subnets, the answer is one NAT Gateway per AZ (not one shared NAT). When it asks where to place an ALB for internet-facing traffic, answer public subnets in at least two AZs. RDS in public subnets is almost never the right answer — use isolated/database subnets with security group restrictions.
Internet Gateway, NAT Gateway & outbound routing
Public subnets reach the internet directly; private subnets need NAT for outbound-only access. NAT Gateway is managed and expensive; NAT instances are legacy. Know the cost model before you deploy.
Internet Gateway vs NAT
| Component | Direction | Subnet | Notes |
|---|---|---|---|
| Internet Gateway (IGW) | Bidirectional | Public only | Resources need public IP or Elastic IP; attached at VPC level |
| NAT Gateway | Outbound only | Public subnet; serves private subnets | Managed, scales to 45 Gbps, AZ-scoped — if NAT AZ fails, that AZ's private subnets lose outbound |
| NAT instance | Outbound only | Public subnet (self-managed EC2) | Legacy — you patch, scale, and HA it yourself; use NAT Gateway instead |
| Egress-Only IGW | IPv6 outbound only | IPv6-enabled subnets | IPv6 addresses are public by default; Egress-Only IGW blocks inbound initiation |
NAT Gateway pricing (us-east-1, check your region): $0.045/hour per NAT Gateway (~$32.85/month per AZ) plus $0.045/GB processed. A single NAT in one AZ is cheaper but creates a cross-AZ failure domain and cross-AZ data charges when private subnets in AZ-b route through NAT in AZ-a.
💰 NAT Gateway cost calculator
Hourly charge + data processing. Gateway endpoints for S3/DynamoDB bypass NAT (free).
A three-AZ VPC with three NAT Gateways processing 2 TB/month can exceed $500/month on NAT alone — before compute. Mitigations: gateway endpoints for S3/DynamoDB (free), interface endpoints only where needed, VPC endpoints for ECR API/DKR to pull images without NAT, and consider enableDnsSupport + private hosted zones to keep traffic internal.
High availability pattern
Deploy one NAT Gateway per AZ in the public subnet of each AZ. Private subnet route tables in AZ-a point to NAT in AZ-a — never cross-AZ NAT routing in production. If cost is the constraint in dev/staging, a single NAT Gateway is acceptable with the understanding that AZ failure takes outbound internet with it for all private subnets.
One NAT vs NAT per AZ: single NAT saves ~$66/month per omitted AZ but creates cross-AZ data transfer charges (~$0.01/GB each direction) and a single point of failure. Production workloads choose NAT per AZ; dev sandboxes often use one NAT or NAT-less setups with VPC endpoints only.
IPv6 and Egress-Only Internet Gateway
IPv6 addresses assigned to instances are globally routable. An Egress-Only Internet Gateway allows outbound IPv6 (software updates, external APIs) while blocking inbound connections initiated from the internet — the IPv6 equivalent of NAT for outbound-only, without address translation. Pair with security groups for fine-grained control.
Run aws ec2 describe-nat-gateways and check VPC Flow Logs filtered on dstPort=443 to find NAT-heavy workloads. Often 30–60% of NAT traffic is S3/ECR/DynamoDB that gateway endpoints eliminate for free. Enable flow logs to an S3 bucket with Athena queries for ongoing cost attribution per subnet.
Security groups & network ACLs
Security groups are your primary firewall — stateful, attached to ENIs, and referenced by ID. NACLs are a secondary, stateless subnet-level filter. Most teams rely almost entirely on security groups.
Security groups: stateful firewall
A security group (SG) acts as a virtual firewall for ENIs. Rules are allow-only (no deny rules). Stateful means if you allow inbound TCP 443, the return traffic is automatically permitted — you don't need an outbound rule for the response. Default: all outbound allowed, all inbound denied.
ALB → App → DB reference pattern
Production three-tier apps chain security groups by reference instead of CIDR:
- ALB SG — inbound 443 from 0.0.0.0/0 (or CloudFront prefix list); outbound to App SG on app port
- App SG — inbound app port only from ALB SG; outbound to DB SG on 5432 (PostgreSQL) or 3306 (MySQL)
- DB SG — inbound database port only from App SG; no outbound to internet
{
"SecurityGroupRules": [
{
"Description": "App tier accepts traffic only from ALB",
"IpProtocol": "tcp",
"FromPort": 8080,
"ToPort": 8080,
"ReferencedGroupId": "sg-alb123456"
}
]
}
Network ACLs: stateless subnet filter
NACLs apply at the subnet boundary. They are stateless — allow inbound AND allow outbound (ephemeral ports) must both be configured explicitly. Rules are numbered (lowest first) and include explicit deny. Default NACL allows all; custom NACLs deny all until you add rules.
| Feature | Security Group | Network ACL |
|---|---|---|
| Scope | ENI / instance level | Subnet level |
| State | Stateful — return traffic auto-allowed | Stateless — must allow both directions |
| Rules | Allow only; evaluated all rules | Allow and deny; numbered order, first match wins |
| Default | Deny inbound, allow outbound | Default NACL: allow all; custom NACL: deny all |
| Primary use | Application firewall — 99% of your rules | Subnet-level guardrails, blocking IP ranges, DMZ separation |
| Reference by SG | Yes — ReferencedGroupId | No — CIDR blocks only |
Never open database security groups to 0.0.0.0/0. Use SG-to-SG references so IP changes (autoscaling, Fargate task replacement) don't break connectivity. Add a NACL deny rule for known malicious CIDRs as defense-in-depth, but don't rely on NACLs for application-level auth — that's what security groups and IAM are for.
Changing a NACL rule affects every instance in the subnet immediately. A misnumbered deny rule can take down an entire AZ. Security group changes are safer — test NACL edits in non-prod first. Common mistake: allowing inbound TCP 443 on a NACL but forgetting outbound ephemeral ports (1024-65535) for return traffic.
Stripe and most fintech platforms use security group referencing exclusively for tier-to-tier traffic — NACLs stay at default allow with optional explicit denies for compliance subnets. Shopify micro-segments services with one SG per service type, referenced by upstream load balancers and service mesh sidecars rather than CIDR-based rules.
VPC endpoints — keep traffic private
VPC endpoints let resources in private subnets reach AWS services without traversing the public internet or NAT Gateway. Gateway endpoints are free; interface endpoints cost per AZ but unlock PrivateLink patterns.
Gateway vs interface endpoints
| Type | Services | How it works | Cost |
|---|---|---|---|
| Gateway | S3, DynamoDB | Route table entry (pl-xxx prefix list) — no ENI | Free |
| Interface (PrivateLink) | Most other AWS services + partner SaaS | ENI with private IP in your subnet; DNS resolves to endpoint | ~$0.01/hr/AZ + $0.01/GB processed |
Endpoint policies
Both gateway and interface endpoints support endpoint policies — IAM-style JSON that restricts which API actions and resources can be accessed through the endpoint. Default policy allows full access; tighten to specific buckets, tables, or accounts for defense in depth:
{
"Statement": [{
"Effect": "Allow",
"Principal": "*",
"Action": ["s3:GetObject", "s3:PutObject"],
"Resource": "arn:aws:s3:::my-app-artifacts/*",
"Condition": {
"StringEquals": { "aws:PrincipalAccount": "123456789012" }
}
}]
}
PrivateLink
AWS PrivateLink (interface endpoints) also exposes your own services to consumers in other VPCs or accounts via endpoint services (NLB-backed). Consumers create an interface endpoint in their VPC; traffic stays on the AWS network. Use for shared platform APIs, SaaS integrations, and cross-account service consumption without VPC peering.
Gateway endpoint for S3
# Create S3 gateway endpoint and attach to private route tables
VPCE_ID=$(aws ec2 create-vpc-endpoint \
--vpc-id vpc-0abc123 \
--service-name com.amazonaws.eu-west-1.s3 \
--route-table-ids rtb-priv-a rtb-priv-b \
--query VpcEndpoint.VpcEndpointId --output text)
# Optional: restrict with endpoint policy
aws ec2 modify-vpc-endpoint --vpc-endpoint-id "$VPCE_ID" \
--policy-document file://s3-endpoint-policy.json
resource "aws_vpc_endpoint" "s3" {
vpc_id = module.vpc.vpc_id
service_name = "com.amazonaws.${var.region}.s3"
route_table_ids = concat(
module.vpc.private_route_table_ids,
module.vpc.database_route_table_ids,
)
policy = jsonencode({
Statement = [{
Effect = "Allow"
Principal = "*"
Action = ["s3:GetObject", "s3:PutObject", "s3:ListBucket"]
Resource = [
aws_s3_bucket.artifacts.arn,
"${aws_s3_bucket.artifacts.arn}/*",
]
}]
})
}
resource "aws_vpc_endpoint" "ecr_api" {
vpc_id = module.vpc.vpc_id
service_name = "com.amazonaws.${var.region}.ecr.api"
vpc_endpoint_type = "Interface"
subnet_ids = module.vpc.private_subnets
security_group_ids = [aws_security_group.vpce.id]
private_dns_enabled = true
}
import * as ec2 from 'aws-cdk-lib/aws-ec2';
// Gateway endpoint — free, route table based
vpc.addGatewayEndpoint('S3Endpoint', {
service: ec2.GatewayVpcEndpointAwsService.S3,
subnets: [{ subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS }],
});
// Interface endpoint — ECR pull without NAT
vpc.addInterfaceEndpoint('EcrApi', {
service: ec2.InterfaceVpcEndpointAwsService.ECR,
subnets: { subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS },
});
vpc.addInterfaceEndpoint('EcrDkr', {
service: ec2.InterfaceVpcEndpointAwsService.ECR_DOCKER,
subnets: { subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS },
});
When the exam asks how to access S3 from a private subnet without NAT, answer gateway VPC endpoint (route table association). For accessing your own NLB service from another VPC without peering, answer PrivateLink / interface endpoint. Gateway endpoints do not use security groups; interface endpoints do.
Minimum VPC endpoint set for NAT-less private ECS/EKS: S3 gateway (free), ECR API + ECR DKR + CloudWatch Logs interface endpoints, and SSM interface endpoints if you use Session Manager instead of bastion hosts. Add Secrets Manager endpoint if tasks fetch secrets at startup — otherwise those calls go through NAT.
VPC peering & Transit Gateway
Connecting VPCs requires understanding transitivity limits. VPC peering is simple and non-transitive; Transit Gateway scales to hub-and-spoke topologies across accounts and regions.
VPC peering
A VPC peering connection links two VPCs with private IP routing. Peering is non-transitive: if VPC-A peers with VPC-B and VPC-B peers with VPC-C, VPC-A cannot reach VPC-C through B. You must create a direct peering connection or use Transit Gateway.
- Same or different accounts, same or different regions (inter-region peering supported)
- CIDR blocks must not overlap
- No transitive routing — no "hub" VPC that forwards for others
- DNS resolution can be enabled to resolve private hosted zone names across peered VPCs
- Update route tables on both sides manually
Transit Gateway (TGW)
Transit Gateway is a regional hub that connects VPCs, VPNs, and Direct Connect. Attachments propagate routes; you control route tables per attachment. Supports hub-and-spoke at scale — dozens of VPCs, cross-account via RAM (Resource Access Manager), and inter-region peering between TGWs.
| Scenario | VPC Peering | Transit Gateway |
|---|---|---|
| Two VPCs, simple link | ✅ Cheapest, lowest complexity | Overkill |
| 3+ VPCs mesh | ❌ N×(N-1)/2 peerings — unmanageable | ✅ Star topology via TGW |
| Shared services VPC | ❌ Non-transitive — each spoke needs peering to shared | ✅ One attachment per VPC to TGW |
| On-premises via VPN/DX | ❌ Cannot attach VPN to peering | ✅ Single VPN/DX attachment to TGW |
| Cross-account at scale | Possible but manual per peering | ✅ RAM sharing + route table isolation |
| Cost sensitivity | No hourly charge; data transfer only | ~$0.05/hr + $0.02/GB processed |
flowchart TB TGW["Transit Gateway\n(hub)"] SHARED["Shared Services VPC\nDNS, logging, CI"] PROD["Production VPC"] STAGE["Staging VPC"] DEV["Development VPC"] VPN["Site-to-Site VPN\nOn-premises"] PROD --> TGW STAGE --> TGW DEV --> TGW SHARED --> TGW VPN --> TGW
Peering mesh vs TGW: three VPCs need three peering connections; ten VPCs need 45 peerings. TGW adds ~$36/month base cost but simplifies routing and on-prem connectivity. For two VPCs in one account, peering wins. For an organization with shared-services, prod, staging, and data lake VPCs, TGW is standard.
TGW maintains separate route tables per attachment domain — e.g. isolate prod VPCs from dev VPCs even though both attach to the same TGW. Route propagation can be automatic (VPC CIDR advertised) or static. Blackhole routes drop traffic silently — always verify with VPC Reachability Analyzer after TGW changes.
Route53 & DNS
Route53 is AWS's authoritative DNS. It routes users to endpoints globally, resolves private names inside your VPC, and bridges on-premises DNS with cloud resources via Route53 Resolver.
Routing policies
| Policy | Behavior | Use case |
|---|---|---|
| Simple | One record, one target (or multiple with same value) | Single-region app, internal service names |
| Weighted | Split traffic by weight (e.g. 90/10) | Blue/green deployments, canary releases, gradual migration |
| Latency | Route to lowest-latency region for the user | Multi-region active-active APIs, global user base |
| Failover | Primary + secondary; health check on primary | DR — automatic failover when primary ALB fails health check |
| Geolocation | Route by user's geographic location (country/continent) | GDPR data residency, locale-specific content, regulatory boundaries |
| Geoproximity | Route by geographic distance with bias controls | Fine-tune traffic distribution across regions |
| Multivalue | Return up to 8 healthy records (not true load balancing) | Simple HA — client picks from healthy IPs |
Private hosted zones
A private hosted zone is associated with one or more VPCs and resolves DNS names only inside those VPCs — e.g. api.internal.mycompany.com → internal ALB. Cross-VPC resolution works via peering/TGW with enableDnsHostnames and private zone association on each VPC. Split-horizon DNS uses the same domain publicly and privately with different records.
Route53 Resolver
For hybrid cloud, Route53 Resolver provides inbound and outbound endpoints:
- Inbound endpoint — on-premises DNS forwards queries for cloud domains to Resolver ENIs in your VPC
- Outbound endpoint — forward VPC queries for on-prem domains (e.g. corp.local) to on-premises DNS servers via forwarding rules
- Resolver rules — conditional forwarding: match domain suffix, forward to target IPs
# CloudFormation / example failover record structure
Resources:
PrimaryRecord:
Type: AWS::Route53::RecordSet
Properties:
HostedZoneId: !Ref HostedZone
Name: api.example.com
Type: A
SetIdentifier: primary
Failover: PRIMARY
AliasTarget:
DNSName: !GetAtt PrimaryAlb.DNSName
HostedZoneId: !GetAtt PrimaryAlb.CanonicalHostedZoneID
HealthCheckId: !Ref PrimaryHealthCheck
SecondaryRecord:
Type: AWS::Route53::RecordSet
Properties:
HostedZoneId: !Ref HostedZone
Name: api.example.com
Type: A
SetIdentifier: secondary
Failover: SECONDARY
AliasTarget:
DNSName: !GetAtt SecondaryAlb.DNSName
HostedZoneId: !GetAtt SecondaryAlb.CanonicalHostedZoneID
Slack and Lyft use Route53 latency-based routing for global API endpoints, with failover records tied to Route53 health checks on regional ALBs. Internal microservices resolve via private hosted zones (*.service.consul or Cloud Map) while customer-facing domains use public hosted zones with weighted routing for deployments.
Failover routing requires a health check on the primary record. Weighted routing does not require health checks (unhealthy targets still receive traffic unless you use alias with ELB). For "route users to the closest region," answer latency routing. For "block EU users from US-only data," answer geolocation with a default record for unmatched regions.
Enable DNSSEC signing on public hosted zones for domains you control end-to-end. Use private hosted zones for internal service discovery — never expose internal ALB DNS names publicly. Restrict Route53 API access with IAM; use route53:ChangeResourceRecordSets scoped to specific hosted zone ARNs in CI/CD deploy roles.