Compute: EC2, ECS, EKS & Lambda

EC2 deep dive

EC2 is still the foundation of AWS compute — even when you run containers or Lambda, something underneath is often an EC2 instance. Understanding instance families, storage, and bootstrap patterns prevents over-provisioning and security gaps like open IMDSv1 metadata endpoints.

Instance families — the letter tells you the workload

AWS names instance types as family.size — e.g. m7g.large. The family letter is the primary sizing signal; the number is the generation (higher = newer, usually better price/performance).

Family	Optimized for	Examples	Typical use
t (burstable)	Baseline CPU with burst credits	t4g.micro, t3.medium	Dev/staging, low-traffic APIs, bastion hosts — not sustained CPU
m (general)	Balanced CPU, memory, network	m7i.large, m7g.xlarge	Spring Boot services, app servers, small databases
c (compute)	High CPU ratio	c7i.2xlarge, c7g.4xlarge	Batch processing, transcode, high-throughput APIs
r (memory)	High RAM ratio	r7i.xlarge, r6g.2xlarge	In-memory caches, JVM heaps, analytics
i (storage I/O)	High local NVMe IOPS	i4i.xlarge, i3en.2xlarge	NoSQL, time-series DBs, high-write workloads
p (GPU)	NVIDIA GPUs	p4d.24xlarge, g5.xlarge	ML training/inference, rendering
inf (Inferentia)	AWS ML inference chips	inf2.xlarge	Cost-efficient model serving at scale

Graviton (ARM) for Java and Spring

Graviton instances (*g suffix — e.g. m7g, c7g, r7g) use AWS-designed ARM Neoverse cores. For Java/Spring workloads on Amazon Corretto 17+ or GraalVM native images, Graviton often delivers 20–40% better price/performance vs equivalent x86 (*i Intel, *a AMD).

Use multi-arch images, verify JNI/native deps, and load-test JVM GC on ARM before prod cutover. Graviton Spot and Savings Plans stack well for batch workloads.

💡 Pro Tip

Start new Spring Boot projects on m7g.large in dev. If p99 latency and GC pause metrics match x86 baselines after a week of load testing, promote Graviton to staging and prod — don't assume ARM incompatibility without measuring.

Purchasing options

Option	Commitment	Discount	Best for
On-Demand	None	Baseline price	Spiky/unpredictable load, short-lived environments, prod baseline you can't interrupt
Reserved Instances (RI)	1 or 3 years; specific instance family/region	Up to ~72% vs On-Demand	Steady-state baseline — RDS-style always-on app servers
Savings Plans	$/hr commit (Compute or EC2 Instance SP)	Similar to RI; more flexible	Mixed instance types/regions — preferred over RI for most teams now
Spot	None; can be interrupted with 2-min notice	Up to ~90% off	Fault-tolerant batch, CI runners, Karpenter/EKS, ASG with mixed instances
Dedicated Hosts / Instances	Physical server isolation	Premium pricing	License compliance (BYOL), regulatory isolation — rarely needed otherwise

💰 Cost

Savings Plans beat On-Demand immediately for any baseline fleet running 24/7. Layer Spot on top for interruptible capacity — never run stateful primary databases on Spot. Use aws compute-optimizer and Cost Explorer "Savings Plans recommendations" before committing to a 3-year term.

EBS volume types

Type	Use case	IOPS / throughput	Notes
gp3	Default for most workloads	3,000 IOPS / 125 MB/s baseline; independently scalable	Replace gp2 — cheaper, decouple IOPS from size
io2	Mission-critical databases	Up to 256,000 IOPS; 99.999% durability (io2 Block Express)	Provisioned IOPS; pay for what you need
st1	Throughput-heavy sequential reads	Low cost per GB; HDD-backed	Big data, log processing — not boot volumes
sc1	Cold throughput storage	Lowest $/GB HDD	Infrequently accessed bulk data

AMI golden image workflow

A golden AMI is a hardened, tested base image baked by CI — not manual console clicks. Pipeline: Packer/Ansible builds image → vulnerability scan → register AMI → launch template references latest approved version tag. App deployments swap AMIs or launch template versions, not SSH into running instances.

Pipeline: Packer/Ansible on AL2023 → CIS hardening → IMDSv2-only → register with Approved=true tag → launch template references latest approved version. Expire stale AMIs via DLM after 90 days.

UserData runs once at first boot via cloud-init — agent install, volume mount, cluster join. Never embed secrets; fetch from SSM or Secrets Manager. Keep scripts idempotent and log to CloudWatch.

IMDSv2 — metadata service hardening

The Instance Metadata Service (IMDS) at 169.254.169.254 exposes instance identity and IAM role credentials. IMDSv1 uses simple HTTP GET — vulnerable to SSRF attacks that steal credentials. IMDSv2 requires a session token via PUT first.

Enforce HttpTokens: required in launch templates (IMDSv2-only)
Set HttpPutResponseHopLimit: 1 for containers unless you explicitly need hop > 1
Hop limit > 1 needed for Docker on EC2 to reach IMDS from containers — prefer task roles on ECS instead

🔒 Security

Capital One's 2019 breach exploited SSRF → IMDSv1 → IAM credentials. Account-level setting: require IMDSv2 on all new instances. Audit with Config rule ec2-imdsv2-check and remediate non-compliant launch templates.

🎯 Exam Tip

gp3 vs gp2: gp3 lets you provision IOPS independently of volume size — exam favorite. Spot interruption: 2-minute warning via instance metadata and EventBridge — design for graceful shutdown. Placement groups: cluster (low latency HPC), spread (max isolation), partition (large distributed systems).

Auto Scaling Groups

An Auto Scaling Group (ASG) maintains a desired count of EC2 instances across Availability Zones. Launch templates (replacing launch configurations) define what gets launched; scaling policies decide when. Production ASGs always pair with health checks, lifecycle hooks, and scale-in protection for stateful nodes.

ASG core concepts

Setting	Purpose	Production guidance
Min / Desired / Max	Capacity bounds	Min ≥ 2 across AZs for HA; max caps runaway scaling bills
Launch template	AMI, instance type, SG, user data, IMDS config	Version every change; use $Latest or explicit version in ASG
Health check	EC2 status vs ELB target health	Use ELB health for app-aware replacement — EC2-only misses app failures
Warm pool	Pre-initialized stopped instances	Faster scale-out for slow-boot JVM apps

Scaling policy types

Policy	Trigger	When to use
Target tracking	Maintain metric at target (e.g. CPU 60%)	Default choice — simplest, self-tuning
Step scaling	Metric thresholds → add/remove N instances	Non-linear response — aggressive scale-out, conservative scale-in
Scheduled	Cron-like min/desired changes	Known traffic patterns — Black Friday, business hours batch
Predictive scaling	ML forecast from historical metrics	Regular daily/weekly cycles — pre-warm before traffic spike

🔬 Under the Hood

Target tracking uses a proportional-integral controller — it doesn't just react to current CPU but anticipates drift from the target. Scale-in has a default 300-second cooldown to prevent flapping. Lifecycle hooks pause instance launch/terminate so your app can drain connections before the ALB deregistration completes.

Lifecycle hooks

Hooks fire on autoscaling:EC2_INSTANCE_LAUNCHING and EC2_INSTANCE_TERMINATING. While in Pending:Wait or Terminating:Wait, the ASG waits (up to heartbeat timeout, default 3600s) for a Lambda, SQS, or manual complete-lifecycle-action call.

Launch hook: run config management, register with service mesh, warm JVM before traffic
Terminate hook: drain queue, flush logs, deregister from Consul/Eureka
Always set heartbeat timeout < your max drain time + buffer

Scale-in protection

SetInstanceProtection marks instances as protected from scale-in (not from manual termination or Spot interruption). Use for long-running batch jobs on shared ASGs — e.g. a 6-hour ETL worker shouldn't disappear mid-run because CPU dropped fleet-wide.

Launch template with IMDSv2 and mixed instances

saved globally

aws ec2 create-launch-template --launch-template-name app-v1 \
  --launch-template-data '{
    "ImageId": "ami-0abcdef1234567890",
    "InstanceType": "m7g.large",
    "IamInstanceProfile": { "Name": "app-ec2-role" },
    "MetadataOptions": {
      "HttpTokens": "required",
      "HttpPutResponseHopLimit": 1,
      "HttpEndpoint": "enabled"
    },
    "UserData": "'$(echo '#!/bin/bash' | base64)'"
  }'

aws autoscaling create-auto-scaling-group \
  --auto-scaling-group-name app-asg \
  --launch-template LaunchTemplateName=app-v1,Version='$Latest' \
  --min-size 2 --max-size 20 --desired-capacity 4 \
  --vpc-zone-identifier "subnet-aaa,subnet-bbb" \
  --health-check-type ELB --health-check-grace-period 300

aws autoscaling put-scaling-policy \
  --auto-scaling-group-name app-asg \
  --policy-name cpu-target-tracking \
  --policy-type TargetTrackingScaling \
  --target-tracking-configuration '{
    "PredefinedMetricSpecification": {
      "PredefinedMetricType": "ASGAverageCPUUtilization"
    },
    "TargetValue": 60.0
  }'

resource "aws_launch_template" "app" {
  name          = "app-v1"
  image_id      = data.aws_ami.golden.id
  instance_type = "m7g.large"

  iam_instance_profile { name = aws_iam_instance_profile.app.name }

  metadata_options {
    http_tokens                 = "required"
    http_put_response_hop_limit = 1
  }

  user_data = base64encode(file("${path.module}/userdata.sh"))
}

resource "aws_autoscaling_group" "app" {
  name                = "app-asg"
  vpc_zone_identifier = var.private_subnet_ids
  min_size            = 2
  max_size            = 20
  desired_capacity    = 4
  health_check_type   = "ELB"

  launch_template {
    id      = aws_launch_template.app.id
    version = "$Latest"
  }

}

resource "aws_autoscaling_policy" "cpu_target" {
  name                   = "cpu-target-tracking"
  autoscaling_group_name = aws_autoscaling_group.app.name
  policy_type            = "TargetTrackingScaling"
  target_tracking_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ASGAverageCPUUtilization"
    }
    target_value = 60.0
  }
}

import * as autoscaling from 'aws-cdk-lib/aws-autoscaling';
import * as ec2 from 'aws-cdk-lib/aws-ec2';

const asg = new autoscaling.AutoScalingGroup(this, 'AppAsg', {
  vpc,
  vpcSubnets: { subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS },
  minCapacity: 2,
  maxCapacity: 20,
  desiredCapacity: 4,
  healthCheck: autoscaling.HealthCheck.elb({ grace: cdk.Duration.seconds(300) }),
  launchTemplate: new ec2.LaunchTemplate(this, 'LaunchTpl', {
    machineImage: ec2.MachineImage.lookup({ name: 'golden-app-*' }),
    instanceType: ec2.InstanceType.of(ec2.InstanceClass.M7G, ec2.InstanceSize.LARGE),
    role: appInstanceRole,
    requireImdsv2: true,
  }),
});

asg.scaleOnCpuUtilization('CpuScaling', { targetUtilizationPercent: 60 });

⚠️ Pitfall

Scaling on CPU alone for JVM apps — heap fills, GC thrashes, but CPU looks fine until OOM kill. Add custom CloudWatch metrics (request latency, queue depth, active threads) or scale on ALB RequestCountPerTarget instead of raw CPU.

ECS & Fargate

Amazon ECS runs Docker containers without you managing Kubernetes. Fargate is serverless containers — no EC2 to patch. For most Spring Boot microservice teams, ECS Fargate is the fastest path from Dockerfile to production with sane defaults.

Task definition anatomy

A task definition is the blueprint: container image, CPU/memory, port mappings, environment, secrets, logging, and IAM roles. A service runs N copies of a task definition with load balancer registration and rolling deployments.

Field	Purpose
family	Logical name; each revision increments
cpu / memory	Fargate requires valid pairs (e.g. 1024 CPU = 1 vCPU)
taskRoleArn	Runtime permissions — S3, DynamoDB, SQS (your app)
executionRoleArn	ECS agent permissions — ECR pull, CloudWatch Logs, secrets injection
containerDefinitions	Image, ports, healthCheck, logConfiguration

Task role vs execution role

Credentials injected into the container — used by your Spring app via default credential chain
Least-privilege per service: s3:GetObject on one prefix, not entire account
Trust: ecs-tasks.amazonaws.com

Used by ECS/Fargate to pull image from ECR and write logs — not visible to app code
Needs AmazonECSTaskExecutionRolePolicy + secrets/SSM read if using secrets in task def
One shared execution role per account is OK; task roles must be per-service

Fargate vs EC2 launch type

Dimension	Fargate	EC2 launch type
Ops burden	No instances to manage	Patch AMIs, scale EC2 ASG, capacity providers
Cost	Premium per vCPU/GB; predictable	Cheaper at scale with Spot/RIs; you optimize packing
Networking	Each task gets ENI (IP per task in awsvpc mode)	Same awsvpc mode; density limited by ENIs per instance
When to pick	Most microservices, variable load, small platform team	High density, GPU, custom kernel, heavy Spot savings

Service discovery

ECS integrates with AWS Cloud Map for DNS-based discovery — orders.svc.local resolves to task IPs. Alternative: ALB for HTTP services (preferred for external traffic), App Mesh for mTLS service mesh. Cloud Map + ECS service registry auto-registers healthy tasks and deregisters on stop.

Deployment circuit breaker

Enable deploymentCircuitBreaker with rollback — if new tasks fail health checks repeatedly, ECS stops the deployment and rolls back to the last stable revision. Without it, a bad image can flap indefinitely, draining capacity.

ECS task definition + Fargate service

saved globally

aws ecs register-task-definition --family order-service \
  --requires-compatibilities FARGATE --network-mode awsvpc \
  --cpu 1024 --memory 2048 \
  --task-role-arn arn:aws:iam::123456789012:role/order-service-task \
  --execution-role-arn arn:aws:iam::123456789012:role/ecsTaskExecutionRole \
  --container-definitions '[{
    "name": "app",
    "image": "123456789012.dkr.ecr.eu-west-1.amazonaws.com/order-service:latest",
    "portMappings": [{ "containerPort": 8080, "protocol": "tcp" }],
    "logConfiguration": {
      "logDriver": "awslogs",
      "options": {
        "awslogs-group": "/ecs/order-service",
        "awslogs-region": "eu-west-1",
        "awslogs-stream-prefix": "app"
      }
    }
  }]'

aws ecs create-service --cluster prod --service-name order-service \
  --task-definition order-service \
  --desired-count 3 --launch-type FARGATE \
  --network-configuration 'awsvpcConfiguration={
    subnets=[subnet-aaa,subnet-bbb],
    securityGroups=[sg-app],
    assignPublicIp=DISABLED
  }' \
  --load-balancers 'targetGroupArn=arn:aws:elasticloadbalancing:...,containerName=app,containerPort=8080' \
  --deployment-configuration 'maximumPercent=200,minimumHealthyPercent=100,deploymentCircuitBreaker={enable=true,rollback=true}'

resource "aws_ecs_task_definition" "order_service" {
  family                   = "order-service"
  requires_compatibilities = ["FARGATE"]
  network_mode             = "awsvpc"
  cpu                      = 1024
  memory                   = 2048
  task_role_arn            = aws_iam_role.task.arn
  execution_role_arn       = aws_iam_role.execution.arn

  container_definitions = jsonencode([{
    name  = "app"
    image = "${aws_ecr_repository.order_service.repository_url}:latest"
    portMappings = [{ containerPort = 8080 }]
    logConfiguration = {
      logDriver = "awslogs"
      options   = { awslogs-group = aws_cloudwatch_log_group.ecs.name }
    }
  }])
}

resource "aws_ecs_service" "order_service" {
  name            = "order-service"
  cluster         = aws_ecs_cluster.prod.id
  task_definition = aws_ecs_task_definition.order_service.arn
  desired_count   = 3
  launch_type     = "FARGATE"

  network_configuration {
    subnets          = var.private_subnet_ids
    security_groups  = [aws_security_group.app.id]
    assign_public_ip = false
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.app.arn
    container_name   = "app"
    container_port   = 8080
  }

  deployment_circuit_breaker {
    enable   = true
    rollback = true
  }
}

import * as ecs from 'aws-cdk-lib/aws-ecs';
import * as ec2 from 'aws-cdk-lib/aws-ec2';
import * as elbv2 from 'aws-cdk-lib/aws-elasticloadbalancingv2';

const taskDef = new ecs.FargateTaskDefinition(this, 'OrderTask', {
  memoryLimitMiB: 2048,
  cpu: 1024,
  taskRole,
  executionRole: ecsTaskExecutionRole,
});

const container = taskDef.addContainer('app', {
  image: ecs.ContainerImage.fromEcrRepository(orderRepo, 'latest'),
  logging: ecs.LogDrivers.awsLogs({ streamPrefix: 'app', logGroup }),
});
container.addPortMappings({ containerPort: 8080 });

const service = new ecs.FargateService(this, 'OrderService', {
  cluster,
  taskDefinition: taskDef,
  desiredCount: 3,
  vpcSubnets: { subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS },
  circuitBreaker: { rollback: true },
});
service.attachToApplicationTargetGroup(targetGroup);

📦 Real World

Figma and many fintech startups run internal services on ECS Fargate with GitHub Actions → ECR → ECS deploy. Platform teams prefer Fargate until container density economics force EC2 capacity providers — typically around hundreds of vCPUs sustained.

EKS & Kubernetes

Amazon EKS runs upstream-compatible Kubernetes. You pay for the control plane (~$0.10/hr per cluster) plus worker compute. Choose EKS when you need the Kubernetes ecosystem (Operators, Helm charts, multi-cloud portability) or already have K8s expertise — not because "Kubernetes is industry standard" alone.

Control plane vs data plane

Control plane (AWS managed): kube-apiserver, etcd, controller-manager, scheduler — multi-AZ, not SSH-accessible
Data plane (your responsibility): worker nodes running pods — managed node groups, Karpenter-provisioned EC2, or Fargate profiles
Workers need IAM, VPC CNI, and kubelet — you own patching unless Karpenter churns nodes automatically

Managed node groups vs Karpenter

Approach	How it scales	Pros	Cons
Managed node groups	ASG behind the scenes; you pick instance types	Simple, AWS-native, predictable	Slower bin-packing; manual instance type choices
Karpenter	Provisioner CRD launches right-sized nodes per pending pods	Fast scale-out, Spot consolidation, optimal instance selection	Extra controller to operate; learning curve
EKS Fargate profiles	Serverless pods — no nodes	Zero node ops	No DaemonSets, limited instance control, higher cost

IRSA — IAM Roles for Service Accounts

IRSA maps a Kubernetes service account to an IAM role via OIDC trust on the EKS cluster. Pods get temporary AWS credentials scoped to that role — the K8s-native equivalent of ECS task roles. Annotate the service account with eks.amazonaws.com/role-arn; use the AWS SDK default credential chain in your app — no access keys in Secrets.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: order-service
  namespace: prod
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/order-service-irsa

When EKS vs ECS

Choose ECS when…	Choose EKS when…
Small platform team, AWS-only, Docker Compose → ECS path	Existing K8s manifests, Helm charts, Operators (Prometheus, Strimzi)
Fargate-first, minimal control plane ops	Multi-cloud portability requirement (same YAML on GCP/Azure)
Simpler IAM (task roles) without OIDC setup	Advanced scheduling (affinity, taints, GPU sharing, service mesh at scale)
Lower control plane cost (no $0.10/hr/cluster)	Large platform org with dedicated K8s SRE team

⚖️ Trade-off

EKS is not free complexity. A three-person team running 8 microservices on ECS Fargate will move slower on EKS for months while learning ingress controllers, CNI, and upgrade cadence. Adopt EKS when K8s-specific capabilities unblock the product — not for resume-driven infrastructure.

🎯 Exam Tip

IRSA is the secure pod credential pattern — not mounting instance profile creds. Pod Identity (newer) simplifies IRSA further — know both exist. EKS control plane logs go to CloudWatch — enable audit logs for compliance questions.

Lambda & serverless

Lambda runs code without provisioning servers — pay per invocation and GB-second. Perfect for event-driven glue, APIs with spiky traffic, and edge logic. Java on Lambda has historically been synonymous with cold starts — SnapStart and GraalVM native change that equation.

Cold starts — the Java problem and fixes

A cold start happens when Lambda creates a new execution environment: download layer, start JVM, run static init, then your handler. Java cold starts of 3–10 seconds were common on large Spring Boot JARs.

Mitigation	How it works	Trade-off
Java SnapStart	Firecracker snapshot after init — restore on next invoke (Java 11+ Corretto)	Not for Spring Native; limited to supported runtimes; no uniqueness in static state
GraalVM / Quarkus native	AOT compile to native binary — sub-second cold starts	Build complexity; reflection config; longer compile times in CI
Smaller deployment package	Trim dependencies; avoid fat JAR Spring if possible	May need architectural split — Lambda for thin handlers only
Provisioned concurrency	Pre-warmed environments always ready	Cost — you pay even when idle

🔬 Under the Hood

Lambda reuses execution environments (warm starts) when traffic is steady — same container, new invoke. SnapStart takes a memory snapshot after Init phase completes; restore skips JVM bootstrap. GraalVM native images skip JVM entirely — the binary is the handler process.

Concurrency model

Account concurrency limit — default 1000 per region; request increase via support
Reserved concurrency — guarantees capacity for a function; also caps max (no steal from pool)
Provisioned concurrency — pre-initialized environments; eliminates cold start for that count
Throttling — when concurrency exhausted, synchronous invokes return 429; async retries with backoff

For APIs behind API Gateway: set reserved concurrency on critical functions so a runaway batch job can't starve payment webhooks. Use CloudWatch alarm on ConcurrentExecutions approaching account limit.

Lambda@Edge

Lambda@Edge runs functions at CloudFront edge locations — rewrite URLs, A/B headers, JWT validation at the CDN, bot detection. Limitations: shorter timeout (5s viewer / 30s origin), smaller deployment package, no VPC. For heavy logic, use CloudFront → regional Lambda or CloudFront Functions for ultra-light transforms (< 1ms).

Lambda function with SnapStart-ready Java

saved globally

aws lambda create-function \
  --function-name order-webhook-handler \
  --runtime java17 \
  --role arn:aws:iam::123456789012:role/lambda-order-handler \
  --handler com.example.OrderHandler::handleRequest \
  --code S3Bucket=artifacts,S3Key=order-handler.zip \
  --memory-size 1024 \
  --timeout 30 \
  --snap-start ApplyOn=PublishedVersions \
  --environment Variables={SPRING_PROFILES_ACTIVE=lambda}

aws lambda publish-version --function-name order-webhook-handler

aws lambda put-provisioned-concurrency-config \
  --function-name order-webhook-handler \
  --qualifier 1 \
  --provisioned-concurrent-executions 10

resource "aws_lambda_function" "order_handler" {
  function_name = "order-webhook-handler"
  role          = aws_iam_role.lambda.arn
  handler       = "com.example.OrderHandler::handleRequest"
  runtime       = "java17"
  memory_size   = 1024
  timeout       = 30
  s3_bucket     = aws_s3_bucket.artifacts.id
  s3_key        = "order-handler.zip"

  snap_start {
    apply_on = "PublishedVersions"
  }

  environment {
    variables = { SPRING_PROFILES_ACTIVE = "lambda" }
  }
}

resource "aws_lambda_provisioned_concurrency_config" "order_handler" {
  function_name                     = aws_lambda_function.order_handler.function_name
  qualifier                         = aws_lambda_function.order_handler.version
  provisioned_concurrent_executions = 10
}

import * as lambda from 'aws-cdk-lib/aws-lambda';
import * as s3 from 'aws-cdk-lib/aws-s3';

const fn = new lambda.Function(this, 'OrderHandler', {
  runtime: lambda.Runtime.JAVA_17,
  handler: 'com.example.OrderHandler::handleRequest',
  code: lambda.Code.fromBucket(s3.Bucket.fromBucketName(this, 'Artifacts', 'artifacts'), 'order-handler.zip'),
  memorySize: 1024,
  timeout: cdk.Duration.seconds(30),
  role: lambdaRole,
  snapStart: lambda.SnapStartConf.ON_PUBLISHED_VERSIONS,
  environment: { SPRING_PROFILES_ACTIVE: 'lambda' },
});

const version = fn.currentVersion;
new lambda.Alias(this, 'Live', {
  aliasName: 'live',
  version,
  provisionedConcurrentExecutions: 10,
});

⚠️ Pitfall

Deploying full Spring Boot MVC to Lambda without SnapStart or native compile — API Gateway timeouts before the first response. Split: API on ECS/EKS, async events to Lambda handlers, or use Quarkus/Micronaut with GraalVM native for sub-second cold starts.

Compute selection matrix

No single compute service wins every workload. Use this matrix in architecture reviews and interviews — the right answer always starts with workload shape, team skills, and operational budget.

Workload signal	EC2 / ASG	ECS Fargate	EKS	Lambda
Always-on HTTP API (Spring Boot)	✓ Classic; full control	✓✓ Sweet spot	✓ If K8s already standard	△ Spiky only; cold start risk
Long-running batch / GPU	✓✓ Spot + ASG	△ 120-min task limit	✓ Jobs/CronJob + Karpenter	✗ 15-min max timeout
Event-driven (S3, SQS, EventBridge)	△ Worker ASG polling	✓ Container workers	✓ K8s consumers	✓✓ Native fit
Traffic pattern	Steady or predictable	Variable microservices	Complex scheduling needs	Sporadic / bursty
Ops team size	Needs EC2/AMI expertise	Minimal — AWS manages nodes	Needs K8s SRE capacity	Minimal — function-level
Startup / scale speed	Minutes (AMI boot)	~60s task start	Minutes (node + pod)	Seconds (warm) / cold start risk
Cost at low traffic	△ Min ASG size cost	△ Per-task hourly	△ Control plane + nodes	✓✓ Pay per invoke
Cost at high sustained load	✓✓ RI/Spot optimized	✓ Good mid-scale	✓✓ Density + Spot	✗ Expensive at volume

⚖️ Trade-off

ECS Fargate vs EKS: Fargate wins on time-to-production and operational simplicity. EKS wins when you need Kubernetes-specific tooling or multi-cloud portability. Lambda vs containers: Lambda wins below ~steady 100 req/s for simple handlers; containers win for long connections, WebSockets, and complex JVM apps. EC2 vs everything: EC2 remains correct for maximum control, licensing, and cost optimization at scale — but you own patching, AMIs, and capacity planning.

💡 Pro Tip

In system design interviews, state your assumptions first: QPS, p99 latency, team size, and burst factor. Then pick one primary compute and one fallback — e.g. "ECS Fargate for the API, Lambda for async webhooks." Interviewers reward explicit trade-offs over "we'd use Kubernetes because it's modern."

📦 Real World

Amazon.com internal services use a mix of EC2, ECS, EKS, and Lambda — no single compute winner. Monzo ran core banking on EC2/K8s early, then adopted Lambda for event pipelines. Pattern: start simple (Fargate or Lambda), split when metrics prove a boundary.