Cheat Sheets

Three dense quick references by persona — daily AWS CLI, platform IaC patterns, and architecture decision matrices. Use Copy sheet for plain text or print (Cmd/Ctrl+P). Filter by persona or search inline.

developer devops architect CLI v2 SAA / DVA / SOA

Developer Cheat Sheet

CLI for daily work, IAM snippets, messaging, Logs Insights, common API errors. Chapters: Compute, Storage, Databases, Messaging.

developer

AWS CLI — profile & region

TaskCommandNotes
SSO loginaws sso login --profile devPreferred over long-lived keys
Set profileexport AWS_PROFILE=devOr --profile dev per command
Caller identityaws sts get-caller-identityVerify account + role ARN
Default regionaws configure set region eu-west-1Override: --region us-east-1
Assume roleaws sts assume-role --role-arn arn:aws:iam::123:role/Dev --role-session-name localExport temp creds from JSON
JSON queryaws ec2 describe-instances --query 'Reservations[].Instances[].InstanceId' --output textJMESPath on any API
Paginate allaws s3api list-objects-v2 --bucket b --query 'Contents[].Key' --output textAdd --page-size / --starting-token
Dry runaws ec2 terminate-instances --instance-ids i-abc --dry-runValidates IAM without action

EC2

TaskCommand
List instancesaws ec2 describe-instances --filters "Name=instance-state-name,Values=running" "Name=tag:Environment,Values=prod"
Launch (exam pattern)aws ec2 run-instances --image-id ami-0abc --instance-type t3.micro --subnet-id subnet-xxx --security-group-ids sg-xxx --iam-instance-profile Name=AppRole --count 1
SSM shell (no SSH key)aws ssm start-session --target i-0123456789abcdef0
User data logssudo tail -f /var/log/cloud-init-output.log
Stop / startaws ec2 stop-instances --instance-ids i-abc · aws ec2 start-instances --instance-ids i-abc
AMI from instanceaws ec2 create-image --instance-id i-abc --name "app-golden-$(date +%F)" --no-reboot
Attach ENI / EIPaws ec2 associate-address --instance-id i-abc --allocation-id eipalloc-xxx
Spot interruptioncurl -s http://169.254.169.254/latest/meta-data/spot/termination-time

S3

TaskCommand
Sync directoryaws s3 sync ./dist s3://my-bucket/app/ --delete --sse AES256
Presigned GET (CLI)aws s3 presign s3://my-bucket/obj.pdf --expires-in 3600
Block public accessaws s3api put-public-access-block --bucket my-bucket --public-access-block-configuration BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true
Versioningaws s3api put-bucket-versioning --bucket my-bucket --versioning-configuration Status=Enabled
Lifecycleaws s3api put-bucket-lifecycle-configuration --bucket my-bucket --lifecycle-configuration file://lifecycle.json
Encryption defaultaws s3api put-bucket-encryption --bucket my-bucket --server-side-encryption-configuration '{"Rules":[{"ApplyServerSideEncryptionByDefault":{"SSEAlgorithm":"aws:kms","KMSMasterKeyID":"alias/my-key"}}]}'
Object ACL / ownershipaws s3api put-bucket-ownership-controls --bucket my-bucket --ownership-controls Rules=[{ObjectOwnership=BucketOwnerEnforced}]
Restore Glacieraws s3api restore-object --bucket b --key archive.zip --restore-request Days=7,GlacierJobParameters={Tier=Standard}

ECS

TaskCommand
List clustersaws ecs list-clusters
Run one-off taskaws ecs run-task --cluster prod --task-definition myapp:42 --launch-type FARGATE --network-configuration "awsvpcConfiguration={subnets=[subnet-a],securityGroups=[sg-app],assignPublicIp=DISABLED}"
Force new deploymentaws ecs update-service --cluster prod --service web --force-new-deployment
Scale serviceaws ecs update-service --cluster prod --service web --desired-count 4
Task logsaws logs tail /ecs/web --follow --since 1h
Exec into containeraws ecs execute-command --cluster prod --task TASK_ARN --container app --interactive --command "/bin/sh"
Register task defaws ecs register-task-definition --cli-input-json file://taskdef.json
Stop taskaws ecs stop-task --cluster prod --task TASK_ARN --reason "debug"

RDS

TaskCommand
Describe instancesaws rds describe-db-instances --query 'DBInstances[?DBInstanceStatus==`available`].[DBInstanceIdentifier,Endpoint.Address]' --output table
Create snapshotaws rds create-db-snapshot --db-instance-identifier prod-db --db-snapshot-identifier prod-db-manual-$(date +%F)
Restore to point in timeaws rds restore-db-instance-to-point-in-time --source-db-instance-identifier prod-db --target-db-instance-identifier prod-db-restore --restore-time 2026-06-01T12:00:00Z
Modify instanceaws rds modify-db-instance --db-instance-identifier prod-db --allocated-storage 200 --apply-immediately
Failover (Multi-AZ)aws rds reboot-db-instance --db-instance-identifier prod-db --force-failover
Parameter groupaws rds modify-db-parameter-group --db-parameter-group-name pg15-custom --parameters ParameterName=max_connections,ParameterValue=500,ApplyMethod=pending-reboot
Secrets Manager rotationaws secretsmanager get-secret-value --secret-id prod/db/credentials --query SecretString --output text

Lambda

TaskCommand
Invoke syncaws lambda invoke --function-name ProcessOrder --payload file://event.json out.json && cat out.json
Invoke asyncaws lambda invoke --function-name ProcessOrder --invocation-type Event --payload file://event.json /dev/null
Update code (zip)aws lambda update-function-code --function-name ProcessOrder --zip-file fileb://function.zip
Update configaws lambda update-function-configuration --function-name ProcessOrder --memory-size 1024 --timeout 30 --environment Variables={ENV=prod}
Publish versionaws lambda publish-version --function-name ProcessOrder
Alias trafficaws lambda update-alias --function-name ProcessOrder --name live --routing-config AdditionalVersionWeights={"2"=0.1}
Tail logsaws logs tail /aws/lambda/ProcessOrder --follow
Concurrency limitaws lambda put-function-concurrency --function-name ProcessOrder --reserved-concurrent-executions 50

IAM policy snippets

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": ["s3:GetObject", "s3:ListBucket"],
    "Resource": [
      "arn:aws:s3:::my-app-uploads",
      "arn:aws:s3:::my-app-uploads/*"
    ],
    "Condition": {
      "StringEquals": { "aws:RequestedRegion": "eu-west-1" }
    }
  }]
}
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": ["sqs:SendMessage", "sqs:GetQueueUrl"],
    "Resource": "arn:aws:sqs:eu-west-1:123456789012:orders.fifo",
    "Condition": {
      "Bool": { "aws:SecureTransport": "true" }
    }
  }]
}
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": [
      "dynamodb:GetItem", "dynamodb:PutItem",
      "dynamodb:UpdateItem", "dynamodb:Query"
    ],
    "Resource": [
      "arn:aws:dynamodb:eu-west-1:123456789012:table/Orders",
      "arn:aws:dynamodb:eu-west-1:123456789012:table/Orders/index/*"
    ]
  }]
}
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": ["kms:Decrypt", "kms:GenerateDataKey"],
    "Resource": "arn:aws:kms:eu-west-1:123456789012:key/abcd-1234",
    "Condition": {
      "StringEquals": {
        "kms:ViaService": "s3.eu-west-1.amazonaws.com"
      }
    }
  }]
}
PatternKey elements
Least privilege S3Separate ListBucket on bucket ARN vs GetObject on /*
Tag-based access"Condition": { "StringEquals": { "aws:ResourceTag/Environment": "dev" } }
Deny non-TLSaws:SecureTransport = false → Deny
Permission boundaryAttach managed policy as boundary; role policies cannot exceed it
Trust policy (ECS task)Principal: { "Service": "ecs-tasks.amazonaws.com" } + sts:AssumeRole

SQS & SNS

TaskCommand
Send SQS messageaws sqs send-message --queue-url https://sqs.eu-west-1.amazonaws.com/123/orders --message-body '{"orderId":"42"}'
FIFO sendaws sqs send-message --queue-url .../orders.fifo --message-body '{}' --message-group-id customer-99 --message-deduplication-id $(uuidgen)
Receive / deleteaws sqs receive-message --queue-url URL --max-number-of-messages 10 --wait-time-seconds 20
Purge queueaws sqs purge-queue --queue-url URL
DLQ redriveaws sqs start-message-move-task --source-arn arn:aws:sqs:...:orders-dlq --destination-arn arn:aws:sqs:...:orders
SNS publishaws sns publish --topic-arn arn:aws:sns:eu-west-1:123:alerts --message "CPU high" --subject "Alert"
SNS → SQS subaws sns subscribe --topic-arn TOPIC --protocol sqs --notification-endpoint arn:aws:sqs:...:worker --attributes RawMessageDelivery=true
Filter policyaws sns subscribe ... --attributes FilterPolicy='{"eventType":["order.created"]}'

CloudWatch Logs Insights

Use caseQuery
Lambda errors (5 min)fields @timestamp, @message | filter @message like /ERROR/ | sort @timestamp desc | limit 50
Lambda duration p99filter @type = "REPORT" | stats pct(@duration, 99) as p99, avg(@duration) as avg_ms by bin(5m)
ALB 5xx by targetparse @message '"target_status_code":*,' as code | filter code >= 500 | stats count() by target_group_arn
ECS OOMfields @timestamp, @message | filter @message like /OutOfMemoryError/ or @message like /OOMKilled/
API Gateway 429filter status = 429 | stats count() as throttled by bin(1m)
Trace by request IDfields @timestamp, @message | filter @requestId = "abc-123-def"
# Run query from CLI (last 1 hour)
aws logs start-query \
  --log-group-name /aws/lambda/ProcessOrder \
  --start-time $(date -u -v-1H +%s) \
  --end-time $(date -u +%s) \
  --query-string 'fields @timestamp, @message | filter @message like /ERROR/ | limit 20'

# Poll results
aws logs get-query-results --query-id QUERY_ID

Common API errors

ErrorTypical causeFix
AccessDeniedMissing IAM action, SCP deny, or resource policyCloudTrail errorMessage; simulate with IAM Policy Simulator
AccessDeniedException (KMS)Key policy or grant missing for roleAdd kms:Decrypt on key; check kms:ViaService
ThrottlingExceptionAPI TPS exceeded (DynamoDB, Lambda, etc.)Exponential backoff + jitter; raise limits; use SQS buffer
ProvisionedThroughputExceededExceptionDynamoDB RCU/WCU hot partitionOn-demand mode; better partition key; DAX
InvalidParameterValueExceptionBad subnet/SG combo, env var size, Lambda layerValidate VPC config; shrink env; check layer compat
ResourceNotFoundExceptionWrong region, deleted resource, typo in ARNaws sts get-caller-identity; verify region
BucketAlreadyOwnedByYouGlobal S3 namespace collisionPick unique bucket name
SignatureDoesNotMatchClock skew, wrong secret, region mismatchSync NTP; verify cred chain; check SigV4 region
ExpiredTokenSTS session expiredRefresh SSO / re-assume role
ServiceUnavailableTransient AZ or service issueRetry with backoff; check Health Dashboard

Java — S3 presigned URL (AWS SDK v2)

import software.amazon.awssdk.services.s3.presigner.S3Presigner;
import software.amazon.awssdk.services.s3.model.GetObjectRequest;
import software.amazon.awssdk.services.s3.presigner.model.GetObjectPresignRequest;
import java.time.Duration;

// Use DefaultCredentialsProvider (ECS task role, SSO profile locally)
try (S3Presigner presigner = S3Presigner.create()) {
    GetObjectRequest getReq = GetObjectRequest.builder()
        .bucket("my-app-uploads")
        .key("reports/q2.pdf")
        .build();

    GetObjectPresignRequest presignReq = GetObjectPresignRequest.builder()
        .signatureDuration(Duration.ofMinutes(15))
        .getObjectRequest(getReq)
        .build();

    String url = presigner.presignGetObject(presignReq).url().toString();
    // Return url to browser; no bucket credentials exposed
}
TipDetail
PUT uploadpresignPutObject(PutObjectPresignRequest) — set Content-Type in signed request
KMS bucketPresigner role needs s3:GetObject + kms:Decrypt on CMK
ExpiryMax 7 days (SigV4); prefer 5–15 min for downloads
SpringInject S3Presigner bean; never embed access keys in application.yml

DevOps Cheat Sheet

Network layout, SG tiers, IaC quick refs, cost controls, ECS ops. Chapters: VPC, Compute, Observability.

devops

VPC CIDR planning

TierRecommended CIDRUsable IPs / subnetNotes
VPC (prod)10.0.0.0/1665,536 totalOne /16 per account/env; avoid overlap for peering/TGW
Public subnet (AZ-a)10.0.0.0/20~4,091ALB, NAT Gateway, bastion (avoid if possible)
Public subnet (AZ-b)10.0.16.0/20~4,091Multi-AZ ALB requires 2+ public subnets
Private app (AZ-a)10.0.32.0/20~4,091ECS tasks, Lambda (VPC), EC2 app tier
Private app (AZ-b)10.0.48.0/20~4,091Spread workloads evenly across AZs
Private data (AZ-a)10.0.64.0/24~251RDS, ElastiCache — no internet route
Private data (AZ-b)10.0.65.0/24~251RDS subnet group spans both
Spare / TGW10.0.128.0/17Future expansion, VPN, hybrid
RuleWhy
Reserve 3+ AZs worth of /20 app subnetsECS/EKS pod density grows; /24 fills fast with ENIs
Never use 172.31.0.0/16 default VPC CIDR in prod designExam trap; default VPC is not HA or segmented
Document RFC1918 overlap before TGW peeringOverlapping CIDR = no peering path
Enable VPC Flow Logs to S3 (Parquet) for cost at scaleCloudWatch Logs expensive for high-volume VPCs

Security group patterns — ALB → App → DB

SGInboundOutbound
sg-alb443 from 0.0.0.0/0 (or CloudFront prefix list)8080 → sg-app only
sg-app8080 from sg-alb5432 → sg-db; 443 → 0.0.0.0/0 (AWS APIs via NAT)
sg-db5432 from sg-app onlyNone required (stateful return traffic)
sg-bastion (if used)22/443 from corp IP range22 → sg-app (prefer SSM instead)
Anti-patternFix
App SG allows 0.0.0.0/0:8080Reference ALB SG ID as source
DB SG allows entire VPC CIDRReference app SG only
ALB in private subnet without VPC endpointALB must be public subnets OR use internal ALB + VPN
NACL duplicates SG rulesDefault NACL allow-all; enforce at SG layer

Terraform resource quick ref

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 5.0"

  name = "prod"
  cidr = "10.0.0.0/16"
  azs  = ["eu-west-1a", "eu-west-1b", "eu-west-1c"]

  private_subnets = ["10.0.32.0/20", "10.0.48.0/20", "10.0.64.0/20"]
  public_subnets  = ["10.0.0.0/20",  "10.0.16.0/20", "10.0.80.0/20"]

  enable_nat_gateway = true
  single_nat_gateway = false  # one NAT per AZ = HA, higher cost
  enable_flow_log    = true
}
resource "aws_ecs_service" "web" {
  name            = "web"
  cluster         = aws_ecs_cluster.prod.id
  task_definition = aws_ecs_task_definition.web.arn
  desired_count   = 2
  launch_type     = "FARGATE"

  network_configuration {
    subnets          = module.vpc.private_subnets
    security_groups  = [aws_security_group.app.id]
    assign_public_ip = false
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.web.arn
    container_name   = "app"
    container_port   = 8080
  }

  lifecycle { ignore_changes = [desired_count] } # allow autoscaling
}
resource "aws_db_instance" "prod" {
  identifier     = "prod-db"
  engine         = "postgres"
  engine_version = "15"
  instance_class = "db.r6g.large"
  allocated_storage = 100
  storage_encrypted = true
  kms_key_id        = aws_kms_key.rds.arn

  db_subnet_group_name   = aws_db_subnet_group.prod.name
  vpc_security_group_ids = [aws_security_group.db.id]
  multi_az               = true
  backup_retention_period = 7
  deletion_protection    = true
  skip_final_snapshot    = false
  final_snapshot_identifier = "prod-db-final"
}
ResourceTerraform typeCritical attributes
ALBaws_lb + aws_lb_target_groupinternal, subnets, health check path
IAM roleaws_iam_role + aws_iam_role_policy_attachmentTrust policy JSON in assume_role_policy
S3 bucketaws_s3_bucket + separate encryption/public access resourcesAWS provider v4+ splits bucket config
Lambdaaws_lambda_functionrole, vpc_config, environment
SQSaws_sqs_queue + aws_sqs_queue (DLQ)redrive_policy, visibility_timeout_seconds
Secretsaws_secretsmanager_secretReference from ECS task def secrets block

CDK construct quick ref (TypeScript)

NeedCDK constructPackage
VPC + subnetsec2.Vpcaws-cdk-lib/aws-ec2
Fargate service + ALBecs_patterns.ApplicationLoadBalancedFargateServiceaws-cdk-lib/aws-ecs-patterns
Lambda + APIapigw.LambdaRestApi or HttpApiaws-apigatewayv2-integrations
RDS Postgresrds.DatabaseInstance or ServerlessCluster (Aurora)aws-cdk-lib/aws-rds
SQS queuesqs.Queue with deadLetterQueueaws-cdk-lib/aws-sqs
Event-driven Lambdalambda.Function + SqsEventSourceaws-lambda-event-sources
Secrets in taskecs.Secret.fromSecretsManageraws-cdk-lib/aws-ecs
WAF on ALBwafv2.CfnWebACL + associationaws-cdk-lib/aws-wafv2
// Minimal Fargate + ALB pattern
const vpc = new ec2.Vpc(this, 'Vpc', { maxAzs: 2, natGateways: 1 });

new ecsPatterns.ApplicationLoadBalancedFargateService(this, 'Service', {
  vpc,
  taskImageOptions: {
    image: ecs.ContainerImage.fromRegistry('nginx'),
    containerPort: 8080,
    environment: { ENV: 'prod' },
  },
  publicLoadBalancer: true,
  desiredCount: 2,
});

NAT Gateway cost formula

ComponentRate (approx.)Formula
Hourly charge~$0.045/hr per NAThours × $0.045 × NAT_count
Data processing~$0.045/GBGB_processed × $0.045
Monthly baseline (1 NAT)~$32/mo idle730 × 0.045
3 NATs (HA per AZ)~$96/mo idleBefore any data transfer

Total NAT cost ≈ (NAT_count × 730 × hourly_rate) + (GB_out × data_processing_rate)

Cost reductionTrade-off
Single NAT Gateway (single_nat_gateway = true)AZ failure loses outbound for all private subnets
VPC endpoints (S3, DynamoDB gateway — free)Eliminates NAT GB for those services
Interface endpoints (ECR, Secrets Manager, SSM)~$7/mo/AZ/endpoint; cheaper at high NAT volume
NAT instance (self-managed)Ops burden; not exam default

Cost optimization checklist

#ActionTool / service
1Right-size EC2/RDS with 2 weeks of metricsCompute Optimizer, Cost Explorer
2Buy Savings Plans / RIs for steady baseline1-yr compute SP for Fargate + Lambda
3S3 lifecycle → IA → Glacier → expireLifecycle rules on versioned buckets
4Delete idle EIPs, unattached EBS, old snapshotsTrusted Advisor, AWS Config rules
5Enable S3 Intelligent-Tiering for unknown accessMonitoring fee vs storage savings
6Lambda: tune memory (affects CPU + cost)Power Tuning tool
7CloudWatch log retention 7–30 days prodNever infinite retention
8Use Graviton (ARM) where supported~20% cheaper t4g/r6g/m6g
9Tag everything: Environment, Team, CostCenterCost allocation tags + budgets
10Schedule dev/staging off-hoursInstance Scheduler, ASG desired=0

Multi-AZ production checklist

LayerRequirementVerify
VPC≥2 AZs, subnets per AZaws ec2 describe-subnets --filters Name=vpc-id,Values=vpc-xxx
ALBSubnets in ≥2 AZsTargets healthy in each AZ
ECS/EKSspread or distinctInstance placementTasks/pods across AZs
RDSmulti_az = trueSync replica in second AZ
NAT1 per AZ (strict HA) or 1 shared (cost)Route tables per private subnet → local NAT
Route 53Health checks + failover/weighted recordsDNS TTL ≤ 60s for failover
S3Cross-region replication for critical dataCRR rule + RTC if needed
BackupsAutomated snapshots + cross-region copyAWS Backup plan
SecretsReplicated secrets (multi-region)Secrets Manager replication
RunbookAZ failure game day documentedSimulate AZ isolate in staging

ECS task definition essentials

FieldValue / patternWhy
requiresCompatibilities["FARGATE"]Serverless; no EC2 capacity management
networkModeawsvpcRequired for Fargate; each task gets ENI
cpu / memoryValid Fargate pairs (e.g. 512/1024, 1024/2048)Invalid combo fails registration
taskRoleArnApp runtime permissions (S3, DynamoDB)SDK uses this — not execution role
executionRoleArnAmazonECSTaskExecutionRolePolicy + ECR pullPull image, write logs, read secrets
logConfigurationawslogs driver → /ecs/service-nameCentralized in CloudWatch
healthCheckCMD curl localhost:8080/actuator/healthALB health ≠ container health for dependency wait
secretsvalueFrom Secrets Manager ARNNever plain env for passwords
readonlyRootFilesystemtrue + tmp volumeSecurity hardening
{
  "family": "web",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "512",
  "memory": "1024",
  "taskRoleArn": "arn:aws:iam::123:role/WebTaskRole",
  "executionRoleArn": "arn:aws:iam::123:role/WebExecutionRole",
  "containerDefinitions": [{
    "name": "app",
    "image": "123.dkr.ecr.eu-west-1.amazonaws.com/web:1.2.3",
    "portMappings": [{ "containerPort": 8080, "protocol": "tcp" }],
    "essential": true,
    "logConfiguration": {
      "logDriver": "awslogs",
      "options": {
        "awslogs-group": "/ecs/web",
        "awslogs-region": "eu-west-1",
        "awslogs-stream-prefix": "app"
      }
    },
    "secrets": [
      { "name": "DB_PASSWORD", "valueFrom": "arn:aws:secretsmanager:eu-west-1:123:secret:prod/db:password::" }
    ]
  }]
}

Architect Cheat Sheet

Service selection matrices, DR targets, Well-Architected pillars, exam limits. Chapters: AWS Core Home, Security, Databases.

architect

Compute selection — ECS vs EKS vs Lambda vs EC2

CriterionECS FargateEKSLambdaEC2
Ops burdenLowHigh (control plane + nodes)LowestHighest
Max durationUnlimitedUnlimited15 minUnlimited
Cold startTask pull ~30sPod schedule ~30sms–secondsAlways warm if running
ScalingService auto scalingHPA + CA + KarpenterAutomatic concurrencyASG manual/custom
PortabilityDocker onlyK8s manifests (multi-cloud)Function per triggerAny workload
Best forAWS-native microservices, Java/SpringExisting K8s teams, operators, service meshEvent-driven, API thin layer, cronLegacy, GPU, custom kernel, license-bound
Cost at steady loadMediumMedium–highHigh at sustained TPSLowest with RIs/SP
Exam default“Containers without managing servers”“Kubernetes on AWS”“Event-driven, no server management”“Full OS control”

Database selection — RDS vs Aurora vs DynamoDB

CriterionRDSAuroraDynamoDB
ModelManaged relational (PG, MySQL, etc.)MySQL/PostgreSQL compatible, distributed storageNoSQL key-value / document
Scale upVertical (instance class)Up to 128 TB storage auto-growHorizontal (on-demand unlimited)
Read scaleRead replicas (async)Up to 15 low-lag replicasDAX for microsecond cache
Multi-AZSynchronous standby6 copies across 3 AZs storage layerMulti-AZ by default
ConsistencyStrong (single instance)Strong primary; eventual on replicasEventually consistent reads optional
Joins / SQLFull SQLFull SQLNo joins; design access patterns first
Best forStandard OLTP, lift-and-shiftHigh-throughput OLTP, global databasesKey-value, sessions, gaming, IoT, single-digit ms
Exam trigger“Managed PostgreSQL”“5× throughput”, “global database”“Massive scale”, “millisecond latency”, “serverless”

Messaging selection — SQS vs SNS vs EventBridge vs Kinesis

CriterionSQSSNSEventBridgeKinesis
PatternQueue (pull, 1 consumer group)Pub/sub fan-out (push)Event bus (routing rules)Stream (ordered shards)
OrderingFIFO queues onlyNo order guarantee (FIFO topic = order)No strict orderPer partition key
Retention1 min – 14 daysNo retention (transient)Replay archive optional1–365 days
ThroughputStandard: unlimited; FIFO: 300 TPSVery high publishHigh; cross-accountShard-limited; auto scaling
ConsumersWorkers, Lambda pollersHTTP, Lambda, SQS, email, SMSLambda, Step Functions, SaaS targetsAnalytics, Lambda, Firehose
Best forDecouple, buffer, DLQ retryAlerts, multi-subscriber notifyEvent-driven architecture, schema registryClickstream, logs, real-time analytics
Exam trigger“Decouple microservices”“Fan-out notifications”“Route events from AWS services”“Real-time processing ordered records”

Disaster recovery — RTO / RPO strategies

StrategyRPORTOAWS patternCost
Backup & restoreHours (backup interval)Hours–daysAMI/EBS/RDS snapshots → restore in DR region$
Pilot lightMinutes–hoursHoursMinimal DR: RDS replica stopped, AMIs copied, Route 53 health failover$$
Warm standbyMinutesMinutes–hoursScaled-down full stack in DR; ASG min=1$$$
Active-activeNear-zeroNear-zeroMulti-region ALB + Aurora Global + DynamoDB Global Tables$$$$
S3 CRR + RTCMinutesMinutesCross-Region Replication; Replication Time Control SLAPer-GB replication
RDS cross-region read replicaSeconds–minutes (async)Promote replica ~minutespromote-read-replica in DR regionReplica instance cost

Well-Architected Framework — six pillars

PillarFocusKey AWS services / practices
Operational ExcellenceRun and monitor systemsCloudWatch, Systems Manager, runbooks, IaC, CI/CD
SecurityProtect data and systemsIAM, KMS, GuardDuty, SCPs, encryption, least privilege
ReliabilityRecover from failureMulti-AZ, auto scaling, backups, chaos testing
Performance EfficiencyUse resources efficientlyRight-sizing, caching (ElastiCache, CloudFront), Graviton
Cost OptimizationAvoid unnecessary spendCost Explorer, SP/RIs, lifecycle, tagging, delete idle
SustainabilityMinimize environmental impactGraviton efficiency, serverless, region selection, density
Review questionPillar
“How do you detect unauthorized API calls?”Security — CloudTrail + GuardDuty
“How do you survive AZ failure?”Reliability — Multi-AZ + health checks
“How do you reduce NAT data charges?”Cost — VPC endpoints
“How do you deploy without downtime?”Operational Excellence — blue/green, CodeDeploy

Exam service limits (memorize)

ServiceLimitExam note
Lambda timeout15 minutes maxLong jobs → Step Functions or ECS
Lambda /tmp storage512 MB – 10,240 MBEphemeral; not durable
Lambda env vars4 KB totalLarge config → SSM/Secrets Manager
Lambda concurrent executions1,000 default/account/regionRequest increase; use SQS buffer
API Gateway REST timeout29 secondsLong poll → async Lambda + webhook
ALB idle timeoutDefault 60s (max 4000s)WebSockets need higher idle
NLBLayer 4, static IP, ultra-low latencyNot HTTP routing — use ALB for path/host
SQS visibility timeout0 – 12 hoursMust exceed Lambda max processing time
SQS message size256 KBLarger payloads → S3 pointer in body
SNS message size256 KBSame S3 extended client pattern
EBS volume max64 TiB (gp3)Snapshot → new volume for resize
RDS backup retention1–35 daysAutomated backups for PITR
VPC per regionDefault 5 (soft limit)Request increase via support
Security groups per ENI5 (can increase to 16)Prefer SG referencing over CIDR sprawl
IAM policy size6,144 chars managed; 2,048 inline userSplit policies; use permission sets
CloudFront originAny HTTP origin + S3Edge caching; signed URLs/cookies
Route 53 health checkHTTP/HTTPS/TCP/CloudWatch alarmFailover routing policy
DynamoDB item size400 KB maxLarge docs → S3 + pointer attribute
Kinesis shard1 MB/s or 1,000 records/s ingestHot shard → better partition key
ECS Fargate task ENIOne ENI per task (awsvpc)IP exhaustion → larger subnets

Networking cost comparison

Traffic pathCharged?Typical rateOptimization
Internet → CloudFront → S3CloudFront egress only~$0.085/GB (varies by region)OAI/OAC; cache hit ratio
Internet → ALB → EC2ALB LCU + EC2 egressLCU ~$0.008/hr + dataCloudFront in front of static
AZ-a → AZ-b (same region)Yes — cross-AZ~$0.01/GB each directionKeep chatter in same AZ where possible
Private subnet → internet (via NAT)NAT hourly + $/GB~$0.045/GB processedVPC endpoints for AWS APIs
S3 gateway endpointFree$0Route S3/DynamoDB without NAT
Interface endpoint (ECR, SM)Hourly + data~$0.01/hr/AZ + $0.01/GBBreak-even vs NAT at ~100 GB/mo
Region → region (S3 CRR)Replication + storage~$0.02/GB inter-regionCRR only for DR-critical buckets
On-prem → AWS (Direct Connect)Port hours + data1 Gbps ~$0.30/hr + transferDX + VPN backup; no NAT for hybrid
VPC peeringSame-region data transfer~$0.01/GBTransitive routing needs TGW
Transit GatewayAttachment hourly + data~$0.05/hr/attach + $0.02/GBHub for many VPCs; replaces full mesh peering

Architecture decision checklist

#QuestionIf yes → consider
1Spiky unpredictable traffic?Lambda + API GW, or Fargate with aggressive scaling
2Existing Kubernetes investment?EKS (or ECS if team prefers simpler AWS-native)
3Need global low-latency reads?Aurora Global, DynamoDB Global Tables, CloudFront
4Strict ordering required?SQS FIFO, Kinesis partition key, or single-thread consumer
5Audit every API call?CloudTrail org trail → S3 + Athena; GuardDuty enabled
6PCI/HIPAA data?KMS CMK, encryption in transit, private subnets, no public S3
7Batch / ETL heavy?Step Functions + Glue, or EMR, not Lambda chains
8Multi-account governance?Organizations + SCPs + Control Tower + IAM Identity Center
9Zero-downtime deploys?CodeDeploy blue/green (ECS/ASG), weighted Route 53
10Cost ceiling for startup?Serverless-first; single NAT; GP3 EBS; 7-day log retention
🎯 Exam Tip

When the question says “decouple” → SQS. “Fan-out” → SNS (+ SQS subs). “Route AWS service events” → EventBridge. “Real-time analytics stream” → Kinesis. “No server management” → Lambda or Fargate. “Lowest ops containers” → ECS Fargate over EKS unless K8s is explicit.