Cheat Sheets

Three dense quick references by persona — daily AWS CLI, platform IaC patterns, and architecture decision matrices. Use Copy sheet for plain text or print (Cmd/Ctrl+P). Filter by persona or search inline.

developer devops architect CLI v2 SAA / DVA / SOA

Developer Cheat Sheet

CLI for daily work, IAM snippets, messaging, Logs Insights, common API errors. Chapters: Compute, Storage, Databases, Messaging.

developer

AWS CLI — profile & region

Task	Command	Notes
SSO login	`aws sso login --profile dev`	Preferred over long-lived keys
Set profile	`export AWS_PROFILE=dev`	Or `--profile dev` per command
Caller identity	`aws sts get-caller-identity`	Verify account + role ARN
Default region	`aws configure set region eu-west-1`	Override: `--region us-east-1`
Assume role	`aws sts assume-role --role-arn arn:aws:iam::123:role/Dev --role-session-name local`	Export temp creds from JSON
JSON query	`aws ec2 describe-instances --query 'Reservations[].Instances[].InstanceId' --output text`	JMESPath on any API
Paginate all	`aws s3api list-objects-v2 --bucket b --query 'Contents[].Key' --output text`	Add `--page-size` / `--starting-token`
Dry run	`aws ec2 terminate-instances --instance-ids i-abc --dry-run`	Validates IAM without action

EC2

Task	Command
List instances	`aws ec2 describe-instances --filters "Name=instance-state-name,Values=running" "Name=tag:Environment,Values=prod"`
Launch (exam pattern)	`aws ec2 run-instances --image-id ami-0abc --instance-type t3.micro --subnet-id subnet-xxx --security-group-ids sg-xxx --iam-instance-profile Name=AppRole --count 1`
SSM shell (no SSH key)	`aws ssm start-session --target i-0123456789abcdef0`
User data logs	`sudo tail -f /var/log/cloud-init-output.log`
Stop / start	`aws ec2 stop-instances --instance-ids i-abc` · `aws ec2 start-instances --instance-ids i-abc`
AMI from instance	`aws ec2 create-image --instance-id i-abc --name "app-golden-$(date +%F)" --no-reboot`
Attach ENI / EIP	`aws ec2 associate-address --instance-id i-abc --allocation-id eipalloc-xxx`
Spot interruption	`curl -s http://169.254.169.254/latest/meta-data/spot/termination-time`

S3

Task	Command
Sync directory	`aws s3 sync ./dist s3://my-bucket/app/ --delete --sse AES256`
Presigned GET (CLI)	`aws s3 presign s3://my-bucket/obj.pdf --expires-in 3600`
Block public access	`aws s3api put-public-access-block --bucket my-bucket --public-access-block-configuration BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true`
Versioning	`aws s3api put-bucket-versioning --bucket my-bucket --versioning-configuration Status=Enabled`
Lifecycle	`aws s3api put-bucket-lifecycle-configuration --bucket my-bucket --lifecycle-configuration file://lifecycle.json`
Encryption default	`aws s3api put-bucket-encryption --bucket my-bucket --server-side-encryption-configuration '{"Rules":[{"ApplyServerSideEncryptionByDefault":{"SSEAlgorithm":"aws:kms","KMSMasterKeyID":"alias/my-key"}}]}'`
Object ACL / ownership	`aws s3api put-bucket-ownership-controls --bucket my-bucket --ownership-controls Rules=[{ObjectOwnership=BucketOwnerEnforced}]`
Restore Glacier	`aws s3api restore-object --bucket b --key archive.zip --restore-request Days=7,GlacierJobParameters={Tier=Standard}`

ECS

Task	Command
List clusters	`aws ecs list-clusters`
Run one-off task	`aws ecs run-task --cluster prod --task-definition myapp:42 --launch-type FARGATE --network-configuration "awsvpcConfiguration={subnets=[subnet-a],securityGroups=[sg-app],assignPublicIp=DISABLED}"`
Force new deployment	`aws ecs update-service --cluster prod --service web --force-new-deployment`
Scale service	`aws ecs update-service --cluster prod --service web --desired-count 4`
Task logs	`aws logs tail /ecs/web --follow --since 1h`
Exec into container	`aws ecs execute-command --cluster prod --task TASK_ARN --container app --interactive --command "/bin/sh"`
Register task def	`aws ecs register-task-definition --cli-input-json file://taskdef.json`
Stop task	`aws ecs stop-task --cluster prod --task TASK_ARN --reason "debug"`

RDS

Task	Command
Describe instances	aws rds describe-db-instances --query 'DBInstances[?DBInstanceStatus==`available`].[DBInstanceIdentifier,Endpoint.Address]' --output table
Create snapshot	`aws rds create-db-snapshot --db-instance-identifier prod-db --db-snapshot-identifier prod-db-manual-$(date +%F)`
Restore to point in time	`aws rds restore-db-instance-to-point-in-time --source-db-instance-identifier prod-db --target-db-instance-identifier prod-db-restore --restore-time 2026-06-01T12:00:00Z`
Modify instance	`aws rds modify-db-instance --db-instance-identifier prod-db --allocated-storage 200 --apply-immediately`
Failover (Multi-AZ)	`aws rds reboot-db-instance --db-instance-identifier prod-db --force-failover`
Parameter group	`aws rds modify-db-parameter-group --db-parameter-group-name pg15-custom --parameters ParameterName=max_connections,ParameterValue=500,ApplyMethod=pending-reboot`
Secrets Manager rotation	`aws secretsmanager get-secret-value --secret-id prod/db/credentials --query SecretString --output text`

Lambda

Task	Command
Invoke sync	`aws lambda invoke --function-name ProcessOrder --payload file://event.json out.json && cat out.json`
Invoke async	`aws lambda invoke --function-name ProcessOrder --invocation-type Event --payload file://event.json /dev/null`
Update code (zip)	`aws lambda update-function-code --function-name ProcessOrder --zip-file fileb://function.zip`
Update config	`aws lambda update-function-configuration --function-name ProcessOrder --memory-size 1024 --timeout 30 --environment Variables={ENV=prod}`
Publish version	`aws lambda publish-version --function-name ProcessOrder`
Alias traffic	`aws lambda update-alias --function-name ProcessOrder --name live --routing-config AdditionalVersionWeights={"2"=0.1}`
Tail logs	`aws logs tail /aws/lambda/ProcessOrder --follow`
Concurrency limit	`aws lambda put-function-concurrency --function-name ProcessOrder --reserved-concurrent-executions 50`

IAM policy snippets

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": ["s3:GetObject", "s3:ListBucket"],
    "Resource": [
      "arn:aws:s3:::my-app-uploads",
      "arn:aws:s3:::my-app-uploads/*"
    ],
    "Condition": {
      "StringEquals": { "aws:RequestedRegion": "eu-west-1" }
    }
  }]
}

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": ["sqs:SendMessage", "sqs:GetQueueUrl"],
    "Resource": "arn:aws:sqs:eu-west-1:123456789012:orders.fifo",
    "Condition": {
      "Bool": { "aws:SecureTransport": "true" }
    }
  }]
}

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": [
      "dynamodb:GetItem", "dynamodb:PutItem",
      "dynamodb:UpdateItem", "dynamodb:Query"
    ],
    "Resource": [
      "arn:aws:dynamodb:eu-west-1:123456789012:table/Orders",
      "arn:aws:dynamodb:eu-west-1:123456789012:table/Orders/index/*"
    ]
  }]
}

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": ["kms:Decrypt", "kms:GenerateDataKey"],
    "Resource": "arn:aws:kms:eu-west-1:123456789012:key/abcd-1234",
    "Condition": {
      "StringEquals": {
        "kms:ViaService": "s3.eu-west-1.amazonaws.com"
      }
    }
  }]
}

Pattern	Key elements
Least privilege S3	Separate `ListBucket` on bucket ARN vs `GetObject` on `/*`
Tag-based access	`"Condition": { "StringEquals": { "aws:ResourceTag/Environment": "dev" } }`
Deny non-TLS	`aws:SecureTransport` = false → Deny
Permission boundary	Attach managed policy as boundary; role policies cannot exceed it
Trust policy (ECS task)	`Principal: { "Service": "ecs-tasks.amazonaws.com" }` + `sts:AssumeRole`

SQS & SNS

Task	Command
Send SQS message	`aws sqs send-message --queue-url https://sqs.eu-west-1.amazonaws.com/123/orders --message-body '{"orderId":"42"}'`
FIFO send	`aws sqs send-message --queue-url .../orders.fifo --message-body '{}' --message-group-id customer-99 --message-deduplication-id $(uuidgen)`
Receive / delete	`aws sqs receive-message --queue-url URL --max-number-of-messages 10 --wait-time-seconds 20`
Purge queue	`aws sqs purge-queue --queue-url URL`
DLQ redrive	`aws sqs start-message-move-task --source-arn arn:aws:sqs:...:orders-dlq --destination-arn arn:aws:sqs:...:orders`
SNS publish	`aws sns publish --topic-arn arn:aws:sns:eu-west-1:123:alerts --message "CPU high" --subject "Alert"`
SNS → SQS sub	`aws sns subscribe --topic-arn TOPIC --protocol sqs --notification-endpoint arn:aws:sqs:...:worker --attributes RawMessageDelivery=true`
Filter policy	`aws sns subscribe ... --attributes FilterPolicy='{"eventType":["order.created"]}'`

CloudWatch Logs Insights

Use case	Query
Lambda errors (5 min)	`fields @timestamp, @message \| filter @message like /ERROR/ \| sort @timestamp desc \| limit 50`
Lambda duration p99	`filter @type = "REPORT" \| stats pct(@duration, 99) as p99, avg(@duration) as avg_ms by bin(5m)`
ALB 5xx by target	`parse @message '"target_status_code":*,' as code \| filter code >= 500 \| stats count() by target_group_arn`
ECS OOM	`fields @timestamp, @message \| filter @message like /OutOfMemoryError/ or @message like /OOMKilled/`
API Gateway 429	`filter status = 429 \| stats count() as throttled by bin(1m)`
Trace by request ID	`fields @timestamp, @message \| filter @requestId = "abc-123-def"`

# Run query from CLI (last 1 hour)
aws logs start-query \
  --log-group-name /aws/lambda/ProcessOrder \
  --start-time $(date -u -v-1H +%s) \
  --end-time $(date -u +%s) \
  --query-string 'fields @timestamp, @message | filter @message like /ERROR/ | limit 20'

# Poll results
aws logs get-query-results --query-id QUERY_ID

Common API errors

Error	Typical cause	Fix
`AccessDenied`	Missing IAM action, SCP deny, or resource policy	CloudTrail `errorMessage`; simulate with IAM Policy Simulator
`AccessDeniedException` (KMS)	Key policy or grant missing for role	Add `kms:Decrypt` on key; check `kms:ViaService`
`ThrottlingException`	API TPS exceeded (DynamoDB, Lambda, etc.)	Exponential backoff + jitter; raise limits; use SQS buffer
`ProvisionedThroughputExceededException`	DynamoDB RCU/WCU hot partition	On-demand mode; better partition key; DAX
`InvalidParameterValueException`	Bad subnet/SG combo, env var size, Lambda layer	Validate VPC config; shrink env; check layer compat
`ResourceNotFoundException`	Wrong region, deleted resource, typo in ARN	`aws sts get-caller-identity`; verify region
`BucketAlreadyOwnedByYou`	Global S3 namespace collision	Pick unique bucket name
`SignatureDoesNotMatch`	Clock skew, wrong secret, region mismatch	Sync NTP; verify cred chain; check SigV4 region
`ExpiredToken`	STS session expired	Refresh SSO / re-assume role
`ServiceUnavailable`	Transient AZ or service issue	Retry with backoff; check Health Dashboard

Java — S3 presigned URL (AWS SDK v2)

import software.amazon.awssdk.services.s3.presigner.S3Presigner;
import software.amazon.awssdk.services.s3.model.GetObjectRequest;
import software.amazon.awssdk.services.s3.presigner.model.GetObjectPresignRequest;
import java.time.Duration;

// Use DefaultCredentialsProvider (ECS task role, SSO profile locally)
try (S3Presigner presigner = S3Presigner.create()) {
    GetObjectRequest getReq = GetObjectRequest.builder()
        .bucket("my-app-uploads")
        .key("reports/q2.pdf")
        .build();

    GetObjectPresignRequest presignReq = GetObjectPresignRequest.builder()
        .signatureDuration(Duration.ofMinutes(15))
        .getObjectRequest(getReq)
        .build();

    String url = presigner.presignGetObject(presignReq).url().toString();
    // Return url to browser; no bucket credentials exposed
}

Tip	Detail
PUT upload	`presignPutObject(PutObjectPresignRequest)` — set `Content-Type` in signed request
KMS bucket	Presigner role needs `s3:GetObject` + `kms:Decrypt` on CMK
Expiry	Max 7 days (SigV4); prefer 5–15 min for downloads
Spring	Inject `S3Presigner` bean; never embed access keys in `application.yml`

DevOps Cheat Sheet

Network layout, SG tiers, IaC quick refs, cost controls, ECS ops. Chapters: VPC, Compute, Observability.

devops

VPC CIDR planning

Tier	Recommended CIDR	Usable IPs / subnet	Notes
VPC (prod)	`10.0.0.0/16`	65,536 total	One /16 per account/env; avoid overlap for peering/TGW
Public subnet (AZ-a)	`10.0.0.0/20`	~4,091	ALB, NAT Gateway, bastion (avoid if possible)
Public subnet (AZ-b)	`10.0.16.0/20`	~4,091	Multi-AZ ALB requires 2+ public subnets
Private app (AZ-a)	`10.0.32.0/20`	~4,091	ECS tasks, Lambda (VPC), EC2 app tier
Private app (AZ-b)	`10.0.48.0/20`	~4,091	Spread workloads evenly across AZs
Private data (AZ-a)	`10.0.64.0/24`	~251	RDS, ElastiCache — no internet route
Private data (AZ-b)	`10.0.65.0/24`	~251	RDS subnet group spans both
Spare / TGW	`10.0.128.0/17`	—	Future expansion, VPN, hybrid

Rule	Why
Reserve 3+ AZs worth of /20 app subnets	ECS/EKS pod density grows; /24 fills fast with ENIs
Never use `172.31.0.0/16` default VPC CIDR in prod design	Exam trap; default VPC is not HA or segmented
Document RFC1918 overlap before TGW peering	Overlapping CIDR = no peering path
Enable VPC Flow Logs to S3 (Parquet) for cost at scale	CloudWatch Logs expensive for high-volume VPCs

Security group patterns — ALB → App → DB

SG	Inbound	Outbound
`sg-alb`	443 from `0.0.0.0/0` (or CloudFront prefix list)	8080 → `sg-app` only
`sg-app`	8080 from `sg-alb`	5432 → `sg-db`; 443 → `0.0.0.0/0` (AWS APIs via NAT)
`sg-db`	5432 from `sg-app` only	None required (stateful return traffic)
`sg-bastion` (if used)	22/443 from corp IP range	22 → `sg-app` (prefer SSM instead)

Anti-pattern	Fix
App SG allows `0.0.0.0/0:8080`	Reference ALB SG ID as source
DB SG allows entire VPC CIDR	Reference app SG only
ALB in private subnet without VPC endpoint	ALB must be public subnets OR use internal ALB + VPN
NACL duplicates SG rules	Default NACL allow-all; enforce at SG layer

Terraform resource quick ref

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 5.0"

  name = "prod"
  cidr = "10.0.0.0/16"
  azs  = ["eu-west-1a", "eu-west-1b", "eu-west-1c"]

  private_subnets = ["10.0.32.0/20", "10.0.48.0/20", "10.0.64.0/20"]
  public_subnets  = ["10.0.0.0/20",  "10.0.16.0/20", "10.0.80.0/20"]

  enable_nat_gateway = true
  single_nat_gateway = false  # one NAT per AZ = HA, higher cost
  enable_flow_log    = true
}

resource "aws_ecs_service" "web" {
  name            = "web"
  cluster         = aws_ecs_cluster.prod.id
  task_definition = aws_ecs_task_definition.web.arn
  desired_count   = 2
  launch_type     = "FARGATE"

  network_configuration {
    subnets          = module.vpc.private_subnets
    security_groups  = [aws_security_group.app.id]
    assign_public_ip = false
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.web.arn
    container_name   = "app"
    container_port   = 8080
  }

  lifecycle { ignore_changes = [desired_count] } # allow autoscaling
}

resource "aws_db_instance" "prod" {
  identifier     = "prod-db"
  engine         = "postgres"
  engine_version = "15"
  instance_class = "db.r6g.large"
  allocated_storage = 100
  storage_encrypted = true
  kms_key_id        = aws_kms_key.rds.arn

  db_subnet_group_name   = aws_db_subnet_group.prod.name
  vpc_security_group_ids = [aws_security_group.db.id]
  multi_az               = true
  backup_retention_period = 7
  deletion_protection    = true
  skip_final_snapshot    = false
  final_snapshot_identifier = "prod-db-final"
}

Resource	Terraform type	Critical attributes
ALB	`aws_lb` + `aws_lb_target_group`	`internal`, `subnets`, health check path
IAM role	`aws_iam_role` + `aws_iam_role_policy_attachment`	Trust policy JSON in `assume_role_policy`
S3 bucket	`aws_s3_bucket` + separate encryption/public access resources	AWS provider v4+ splits bucket config
Lambda	`aws_lambda_function`	`role`, `vpc_config`, `environment`
SQS	`aws_sqs_queue` + `aws_sqs_queue` (DLQ)	`redrive_policy`, `visibility_timeout_seconds`
Secrets	`aws_secretsmanager_secret`	Reference from ECS task def `secrets` block

CDK construct quick ref (TypeScript)

Need	CDK construct	Package
VPC + subnets	`ec2.Vpc`	`aws-cdk-lib/aws-ec2`
Fargate service + ALB	`ecs_patterns.ApplicationLoadBalancedFargateService`	`aws-cdk-lib/aws-ecs-patterns`
Lambda + API	`apigw.LambdaRestApi` or `HttpApi`	`aws-apigatewayv2-integrations`
RDS Postgres	`rds.DatabaseInstance` or `ServerlessCluster` (Aurora)	`aws-cdk-lib/aws-rds`
SQS queue	`sqs.Queue` with `deadLetterQueue`	`aws-cdk-lib/aws-sqs`
Event-driven Lambda	`lambda.Function` + `SqsEventSource`	`aws-lambda-event-sources`
Secrets in task	`ecs.Secret.fromSecretsManager`	`aws-cdk-lib/aws-ecs`
WAF on ALB	`wafv2.CfnWebACL` + association	`aws-cdk-lib/aws-wafv2`

// Minimal Fargate + ALB pattern
const vpc = new ec2.Vpc(this, 'Vpc', { maxAzs: 2, natGateways: 1 });

new ecsPatterns.ApplicationLoadBalancedFargateService(this, 'Service', {
  vpc,
  taskImageOptions: {
    image: ecs.ContainerImage.fromRegistry('nginx'),
    containerPort: 8080,
    environment: { ENV: 'prod' },
  },
  publicLoadBalancer: true,
  desiredCount: 2,
});

NAT Gateway cost formula

Component	Rate (approx.)	Formula
Hourly charge	~$0.045/hr per NAT	`hours × $0.045 × NAT_count`
Data processing	~$0.045/GB	`GB_processed × $0.045`
Monthly baseline (1 NAT)	~$32/mo idle	730 × 0.045
3 NATs (HA per AZ)	~$96/mo idle	Before any data transfer

Total NAT cost ≈ (NAT_count × 730 × hourly_rate) + (GB_out × data_processing_rate)

Cost reduction	Trade-off
Single NAT Gateway (`single_nat_gateway = true`)	AZ failure loses outbound for all private subnets
VPC endpoints (S3, DynamoDB gateway — free)	Eliminates NAT GB for those services
Interface endpoints (ECR, Secrets Manager, SSM)	~$7/mo/AZ/endpoint; cheaper at high NAT volume
NAT instance (self-managed)	Ops burden; not exam default

Cost optimization checklist

#	Action	Tool / service
1	Right-size EC2/RDS with 2 weeks of metrics	Compute Optimizer, Cost Explorer
2	Buy Savings Plans / RIs for steady baseline	1-yr compute SP for Fargate + Lambda
3	S3 lifecycle → IA → Glacier → expire	Lifecycle rules on versioned buckets
4	Delete idle EIPs, unattached EBS, old snapshots	Trusted Advisor, AWS Config rules
5	Enable S3 Intelligent-Tiering for unknown access	Monitoring fee vs storage savings
6	Lambda: tune memory (affects CPU + cost)	Power Tuning tool
7	CloudWatch log retention 7–30 days prod	Never infinite retention
8	Use Graviton (ARM) where supported	~20% cheaper t4g/r6g/m6g
9	Tag everything: `Environment`, `Team`, `CostCenter`	Cost allocation tags + budgets
10	Schedule dev/staging off-hours	Instance Scheduler, ASG desired=0

Multi-AZ production checklist

Layer	Requirement	Verify
VPC	≥2 AZs, subnets per AZ	`aws ec2 describe-subnets --filters Name=vpc-id,Values=vpc-xxx`
ALB	Subnets in ≥2 AZs	Targets healthy in each AZ
ECS/EKS	`spread` or `distinctInstance` placement	Tasks/pods across AZs
RDS	`multi_az = true`	Sync replica in second AZ
NAT	1 per AZ (strict HA) or 1 shared (cost)	Route tables per private subnet → local NAT
Route 53	Health checks + failover/weighted records	DNS TTL ≤ 60s for failover
S3	Cross-region replication for critical data	CRR rule + RTC if needed
Backups	Automated snapshots + cross-region copy	AWS Backup plan
Secrets	Replicated secrets (multi-region)	Secrets Manager replication
Runbook	AZ failure game day documented	Simulate AZ isolate in staging

ECS task definition essentials

Field	Value / pattern	Why
`requiresCompatibilities`	`["FARGATE"]`	Serverless; no EC2 capacity management
`networkMode`	`awsvpc`	Required for Fargate; each task gets ENI
`cpu` / `memory`	Valid Fargate pairs (e.g. 512/1024, 1024/2048)	Invalid combo fails registration
`taskRoleArn`	App runtime permissions (S3, DynamoDB)	SDK uses this — not execution role
`executionRoleArn`	`AmazonECSTaskExecutionRolePolicy` + ECR pull	Pull image, write logs, read secrets
`logConfiguration`	`awslogs` driver → `/ecs/service-name`	Centralized in CloudWatch
`healthCheck`	CMD curl localhost:8080/actuator/health	ALB health ≠ container health for dependency wait
`secrets`	`valueFrom` Secrets Manager ARN	Never plain env for passwords
`readonlyRootFilesystem`	`true` + tmp volume	Security hardening

{
  "family": "web",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "512",
  "memory": "1024",
  "taskRoleArn": "arn:aws:iam::123:role/WebTaskRole",
  "executionRoleArn": "arn:aws:iam::123:role/WebExecutionRole",
  "containerDefinitions": [{
    "name": "app",
    "image": "123.dkr.ecr.eu-west-1.amazonaws.com/web:1.2.3",
    "portMappings": [{ "containerPort": 8080, "protocol": "tcp" }],
    "essential": true,
    "logConfiguration": {
      "logDriver": "awslogs",
      "options": {
        "awslogs-group": "/ecs/web",
        "awslogs-region": "eu-west-1",
        "awslogs-stream-prefix": "app"
      }
    },
    "secrets": [
      { "name": "DB_PASSWORD", "valueFrom": "arn:aws:secretsmanager:eu-west-1:123:secret:prod/db:password::" }
    ]
  }]
}

Architect Cheat Sheet

Service selection matrices, DR targets, Well-Architected pillars, exam limits. Chapters: AWS Core Home, Security, Databases.

architect

Compute selection — ECS vs EKS vs Lambda vs EC2

Criterion	ECS Fargate	EKS	Lambda	EC2
Ops burden	Low	High (control plane + nodes)	Lowest	Highest
Max duration	Unlimited	Unlimited	15 min	Unlimited
Cold start	Task pull ~30s	Pod schedule ~30s	ms–seconds	Always warm if running
Scaling	Service auto scaling	HPA + CA + Karpenter	Automatic concurrency	ASG manual/custom
Portability	Docker only	K8s manifests (multi-cloud)	Function per trigger	Any workload
Best for	AWS-native microservices, Java/Spring	Existing K8s teams, operators, service mesh	Event-driven, API thin layer, cron	Legacy, GPU, custom kernel, license-bound
Cost at steady load	Medium	Medium–high	High at sustained TPS	Lowest with RIs/SP
Exam default	“Containers without managing servers”	“Kubernetes on AWS”	“Event-driven, no server management”	“Full OS control”

Database selection — RDS vs Aurora vs DynamoDB

Criterion	RDS	Aurora	DynamoDB
Model	Managed relational (PG, MySQL, etc.)	MySQL/PostgreSQL compatible, distributed storage	NoSQL key-value / document
Scale up	Vertical (instance class)	Up to 128 TB storage auto-grow	Horizontal (on-demand unlimited)
Read scale	Read replicas (async)	Up to 15 low-lag replicas	DAX for microsecond cache
Multi-AZ	Synchronous standby	6 copies across 3 AZs storage layer	Multi-AZ by default
Consistency	Strong (single instance)	Strong primary; eventual on replicas	Eventually consistent reads optional
Joins / SQL	Full SQL	Full SQL	No joins; design access patterns first
Best for	Standard OLTP, lift-and-shift	High-throughput OLTP, global databases	Key-value, sessions, gaming, IoT, single-digit ms
Exam trigger	“Managed PostgreSQL”	“5× throughput”, “global database”	“Massive scale”, “millisecond latency”, “serverless”

Messaging selection — SQS vs SNS vs EventBridge vs Kinesis

Criterion	SQS	SNS	EventBridge	Kinesis
Pattern	Queue (pull, 1 consumer group)	Pub/sub fan-out (push)	Event bus (routing rules)	Stream (ordered shards)
Ordering	FIFO queues only	No order guarantee (FIFO topic = order)	No strict order	Per partition key
Retention	1 min – 14 days	No retention (transient)	Replay archive optional	1–365 days
Throughput	Standard: unlimited; FIFO: 300 TPS	Very high publish	High; cross-account	Shard-limited; auto scaling
Consumers	Workers, Lambda pollers	HTTP, Lambda, SQS, email, SMS	Lambda, Step Functions, SaaS targets	Analytics, Lambda, Firehose
Best for	Decouple, buffer, DLQ retry	Alerts, multi-subscriber notify	Event-driven architecture, schema registry	Clickstream, logs, real-time analytics
Exam trigger	“Decouple microservices”	“Fan-out notifications”	“Route events from AWS services”	“Real-time processing ordered records”

Disaster recovery — RTO / RPO strategies

Strategy	RPO	RTO	AWS pattern	Cost
Backup & restore	Hours (backup interval)	Hours–days	AMI/EBS/RDS snapshots → restore in DR region	$
Pilot light	Minutes–hours	Hours	Minimal DR: RDS replica stopped, AMIs copied, Route 53 health failover	$$
Warm standby	Minutes	Minutes–hours	Scaled-down full stack in DR; ASG min=1	$$$
Active-active	Near-zero	Near-zero	Multi-region ALB + Aurora Global + DynamoDB Global Tables	$$$$
S3 CRR + RTC	Minutes	Minutes	Cross-Region Replication; Replication Time Control SLA	Per-GB replication
RDS cross-region read replica	Seconds–minutes (async)	Promote replica ~minutes	`promote-read-replica` in DR region	Replica instance cost

Well-Architected Framework — six pillars

Pillar	Focus	Key AWS services / practices
Operational Excellence	Run and monitor systems	CloudWatch, Systems Manager, runbooks, IaC, CI/CD
Security	Protect data and systems	IAM, KMS, GuardDuty, SCPs, encryption, least privilege
Reliability	Recover from failure	Multi-AZ, auto scaling, backups, chaos testing
Performance Efficiency	Use resources efficiently	Right-sizing, caching (ElastiCache, CloudFront), Graviton
Cost Optimization	Avoid unnecessary spend	Cost Explorer, SP/RIs, lifecycle, tagging, delete idle
Sustainability	Minimize environmental impact	Graviton efficiency, serverless, region selection, density

Review question	Pillar
“How do you detect unauthorized API calls?”	Security — CloudTrail + GuardDuty
“How do you survive AZ failure?”	Reliability — Multi-AZ + health checks
“How do you reduce NAT data charges?”	Cost — VPC endpoints
“How do you deploy without downtime?”	Operational Excellence — blue/green, CodeDeploy

Exam service limits (memorize)

Service	Limit	Exam note
Lambda timeout	15 minutes max	Long jobs → Step Functions or ECS
Lambda /tmp storage	512 MB – 10,240 MB	Ephemeral; not durable
Lambda env vars	4 KB total	Large config → SSM/Secrets Manager
Lambda concurrent executions	1,000 default/account/region	Request increase; use SQS buffer
API Gateway REST timeout	29 seconds	Long poll → async Lambda + webhook
ALB idle timeout	Default 60s (max 4000s)	WebSockets need higher idle
NLB	Layer 4, static IP, ultra-low latency	Not HTTP routing — use ALB for path/host
SQS visibility timeout	0 – 12 hours	Must exceed Lambda max processing time
SQS message size	256 KB	Larger payloads → S3 pointer in body
SNS message size	256 KB	Same S3 extended client pattern
EBS volume max	64 TiB (gp3)	Snapshot → new volume for resize
RDS backup retention	1–35 days	Automated backups for PITR
VPC per region	Default 5 (soft limit)	Request increase via support
Security groups per ENI	5 (can increase to 16)	Prefer SG referencing over CIDR sprawl
IAM policy size	6,144 chars managed; 2,048 inline user	Split policies; use permission sets
CloudFront origin	Any HTTP origin + S3	Edge caching; signed URLs/cookies
Route 53 health check	HTTP/HTTPS/TCP/CloudWatch alarm	Failover routing policy
DynamoDB item size	400 KB max	Large docs → S3 + pointer attribute
Kinesis shard	1 MB/s or 1,000 records/s ingest	Hot shard → better partition key
ECS Fargate task ENI	One ENI per task (awsvpc)	IP exhaustion → larger subnets

Networking cost comparison

Traffic path	Charged?	Typical rate	Optimization
Internet → CloudFront → S3	CloudFront egress only	~$0.085/GB (varies by region)	OAI/OAC; cache hit ratio
Internet → ALB → EC2	ALB LCU + EC2 egress	LCU ~$0.008/hr + data	CloudFront in front of static
AZ-a → AZ-b (same region)	Yes — cross-AZ	~$0.01/GB each direction	Keep chatter in same AZ where possible
Private subnet → internet (via NAT)	NAT hourly + $/GB	~$0.045/GB processed	VPC endpoints for AWS APIs
S3 gateway endpoint	Free	$0	Route S3/DynamoDB without NAT
Interface endpoint (ECR, SM)	Hourly + data	~$0.01/hr/AZ + $0.01/GB	Break-even vs NAT at ~100 GB/mo
Region → region (S3 CRR)	Replication + storage	~$0.02/GB inter-region	CRR only for DR-critical buckets
On-prem → AWS (Direct Connect)	Port hours + data	1 Gbps ~$0.30/hr + transfer	DX + VPN backup; no NAT for hybrid
VPC peering	Same-region data transfer	~$0.01/GB	Transitive routing needs TGW
Transit Gateway	Attachment hourly + data	~$0.05/hr/attach + $0.02/GB	Hub for many VPCs; replaces full mesh peering

Architecture decision checklist

#	Question	If yes → consider
1	Spiky unpredictable traffic?	Lambda + API GW, or Fargate with aggressive scaling
2	Existing Kubernetes investment?	EKS (or ECS if team prefers simpler AWS-native)
3	Need global low-latency reads?	Aurora Global, DynamoDB Global Tables, CloudFront
4	Strict ordering required?	SQS FIFO, Kinesis partition key, or single-thread consumer
5	Audit every API call?	CloudTrail org trail → S3 + Athena; GuardDuty enabled
6	PCI/HIPAA data?	KMS CMK, encryption in transit, private subnets, no public S3
7	Batch / ETL heavy?	Step Functions + Glue, or EMR, not Lambda chains
8	Multi-account governance?	Organizations + SCPs + Control Tower + IAM Identity Center
9	Zero-downtime deploys?	CodeDeploy blue/green (ECS/ASG), weighted Route 53
10	Cost ceiling for startup?	Serverless-first; single NAT; GP3 EBS; 7-day log retention

🎯 Exam Tip

When the question says “decouple” → SQS. “Fan-out” → SNS (+ SQS subs). “Route AWS service events” → EventBridge. “Real-time analytics stream” → Kinesis. “No server management” → Lambda or Fargate. “Lowest ops containers” → ECS Fargate over EKS unless K8s is explicit.