Compute: EC2, ECS, EKS & Lambda

AWS offers four primary compute models — virtual machines (EC2), container orchestration (ECS and EKS), and functions (Lambda). Most production platforms use more than one: a Spring Boot API on ECS Fargate, batch jobs on Spot EC2, edge routing via Lambda@Edge, and a data pipeline on managed EKS node groups. This chapter covers how to pick instance types, scale safely, wire IAM roles correctly, and avoid the cold-start and cost traps that show up in real interviews and real incidents.

developer devops architect Regional services

EC2 deep dive

EC2 is still the foundation of AWS compute — even when you run containers or Lambda, something underneath is often an EC2 instance. Understanding instance families, storage, and bootstrap patterns prevents over-provisioning and security gaps like open IMDSv1 metadata endpoints.

Instance families — the letter tells you the workload

AWS names instance types as family.size — e.g. m7g.large. The family letter is the primary sizing signal; the number is the generation (higher = newer, usually better price/performance).

Family Optimized for Examples Typical use
t (burstable) Baseline CPU with burst credits t4g.micro, t3.medium Dev/staging, low-traffic APIs, bastion hosts — not sustained CPU
m (general) Balanced CPU, memory, network m7i.large, m7g.xlarge Spring Boot services, app servers, small databases
c (compute) High CPU ratio c7i.2xlarge, c7g.4xlarge Batch processing, transcode, high-throughput APIs
r (memory) High RAM ratio r7i.xlarge, r6g.2xlarge In-memory caches, JVM heaps, analytics
i (storage I/O) High local NVMe IOPS i4i.xlarge, i3en.2xlarge NoSQL, time-series DBs, high-write workloads
p (GPU) NVIDIA GPUs p4d.24xlarge, g5.xlarge ML training/inference, rendering
inf (Inferentia) AWS ML inference chips inf2.xlarge Cost-efficient model serving at scale

Graviton (ARM) for Java and Spring

Graviton instances (*g suffix — e.g. m7g, c7g, r7g) use AWS-designed ARM Neoverse cores. For Java/Spring workloads on Amazon Corretto 17+ or GraalVM native images, Graviton often delivers 20–40% better price/performance vs equivalent x86 (*i Intel, *a AMD).

Use multi-arch images, verify JNI/native deps, and load-test JVM GC on ARM before prod cutover. Graviton Spot and Savings Plans stack well for batch workloads.

💡 Pro Tip

Start new Spring Boot projects on m7g.large in dev. If p99 latency and GC pause metrics match x86 baselines after a week of load testing, promote Graviton to staging and prod — don't assume ARM incompatibility without measuring.

Purchasing options

Option Commitment Discount Best for
On-Demand None Baseline price Spiky/unpredictable load, short-lived environments, prod baseline you can't interrupt
Reserved Instances (RI) 1 or 3 years; specific instance family/region Up to ~72% vs On-Demand Steady-state baseline — RDS-style always-on app servers
Savings Plans $/hr commit (Compute or EC2 Instance SP) Similar to RI; more flexible Mixed instance types/regions — preferred over RI for most teams now
Spot None; can be interrupted with 2-min notice Up to ~90% off Fault-tolerant batch, CI runners, Karpenter/EKS, ASG with mixed instances
Dedicated Hosts / Instances Physical server isolation Premium pricing License compliance (BYOL), regulatory isolation — rarely needed otherwise
💰 Cost

Savings Plans beat On-Demand immediately for any baseline fleet running 24/7. Layer Spot on top for interruptible capacity — never run stateful primary databases on Spot. Use aws compute-optimizer and Cost Explorer "Savings Plans recommendations" before committing to a 3-year term.

EBS volume types

Type Use case IOPS / throughput Notes
gp3 Default for most workloads 3,000 IOPS / 125 MB/s baseline; independently scalable Replace gp2 — cheaper, decouple IOPS from size
io2 Mission-critical databases Up to 256,000 IOPS; 99.999% durability (io2 Block Express) Provisioned IOPS; pay for what you need
st1 Throughput-heavy sequential reads Low cost per GB; HDD-backed Big data, log processing — not boot volumes
sc1 Cold throughput storage Lowest $/GB HDD Infrequently accessed bulk data

AMI golden image workflow

A golden AMI is a hardened, tested base image baked by CI — not manual console clicks. Pipeline: Packer/Ansible builds image → vulnerability scan → register AMI → launch template references latest approved version tag. App deployments swap AMIs or launch template versions, not SSH into running instances.

Pipeline: Packer/Ansible on AL2023 → CIS hardening → IMDSv2-only → register with Approved=true tag → launch template references latest approved version. Expire stale AMIs via DLM after 90 days.

UserData runs once at first boot via cloud-init — agent install, volume mount, cluster join. Never embed secrets; fetch from SSM or Secrets Manager. Keep scripts idempotent and log to CloudWatch.

IMDSv2 — metadata service hardening

The Instance Metadata Service (IMDS) at 169.254.169.254 exposes instance identity and IAM role credentials. IMDSv1 uses simple HTTP GET — vulnerable to SSRF attacks that steal credentials. IMDSv2 requires a session token via PUT first.

  • Enforce HttpTokens: required in launch templates (IMDSv2-only)
  • Set HttpPutResponseHopLimit: 1 for containers unless you explicitly need hop > 1
  • Hop limit > 1 needed for Docker on EC2 to reach IMDS from containers — prefer task roles on ECS instead
🔒 Security

Capital One's 2019 breach exploited SSRF → IMDSv1 → IAM credentials. Account-level setting: require IMDSv2 on all new instances. Audit with Config rule ec2-imdsv2-check and remediate non-compliant launch templates.

🎯 Exam Tip

gp3 vs gp2: gp3 lets you provision IOPS independently of volume size — exam favorite. Spot interruption: 2-minute warning via instance metadata and EventBridge — design for graceful shutdown. Placement groups: cluster (low latency HPC), spread (max isolation), partition (large distributed systems).

Auto Scaling Groups

An Auto Scaling Group (ASG) maintains a desired count of EC2 instances across Availability Zones. Launch templates (replacing launch configurations) define what gets launched; scaling policies decide when. Production ASGs always pair with health checks, lifecycle hooks, and scale-in protection for stateful nodes.

ASG core concepts

Setting Purpose Production guidance
Min / Desired / Max Capacity bounds Min ≥ 2 across AZs for HA; max caps runaway scaling bills
Launch template AMI, instance type, SG, user data, IMDS config Version every change; use $Latest or explicit version in ASG
Health check EC2 status vs ELB target health Use ELB health for app-aware replacement — EC2-only misses app failures
Warm pool Pre-initialized stopped instances Faster scale-out for slow-boot JVM apps

Scaling policy types

Policy Trigger When to use
Target tracking Maintain metric at target (e.g. CPU 60%) Default choice — simplest, self-tuning
Step scaling Metric thresholds → add/remove N instances Non-linear response — aggressive scale-out, conservative scale-in
Scheduled Cron-like min/desired changes Known traffic patterns — Black Friday, business hours batch
Predictive scaling ML forecast from historical metrics Regular daily/weekly cycles — pre-warm before traffic spike
🔬 Under the Hood

Target tracking uses a proportional-integral controller — it doesn't just react to current CPU but anticipates drift from the target. Scale-in has a default 300-second cooldown to prevent flapping. Lifecycle hooks pause instance launch/terminate so your app can drain connections before the ALB deregistration completes.

Lifecycle hooks

Hooks fire on autoscaling:EC2_INSTANCE_LAUNCHING and EC2_INSTANCE_TERMINATING. While in Pending:Wait or Terminating:Wait, the ASG waits (up to heartbeat timeout, default 3600s) for a Lambda, SQS, or manual complete-lifecycle-action call.

  • Launch hook: run config management, register with service mesh, warm JVM before traffic
  • Terminate hook: drain queue, flush logs, deregister from Consul/Eureka
  • Always set heartbeat timeout < your max drain time + buffer

Scale-in protection

SetInstanceProtection marks instances as protected from scale-in (not from manual termination or Spot interruption). Use for long-running batch jobs on shared ASGs — e.g. a 6-hour ETL worker shouldn't disappear mid-run because CPU dropped fleet-wide.

Launch template with IMDSv2 and mixed instances

saved globally
bash
aws ec2 create-launch-template --launch-template-name app-v1 \
  --launch-template-data '{
    "ImageId": "ami-0abcdef1234567890",
    "InstanceType": "m7g.large",
    "IamInstanceProfile": { "Name": "app-ec2-role" },
    "MetadataOptions": {
      "HttpTokens": "required",
      "HttpPutResponseHopLimit": 1,
      "HttpEndpoint": "enabled"
    },
    "UserData": "'$(echo '#!/bin/bash' | base64)'"
  }'

aws autoscaling create-auto-scaling-group \
  --auto-scaling-group-name app-asg \
  --launch-template LaunchTemplateName=app-v1,Version='$Latest' \
  --min-size 2 --max-size 20 --desired-capacity 4 \
  --vpc-zone-identifier "subnet-aaa,subnet-bbb" \
  --health-check-type ELB --health-check-grace-period 300

aws autoscaling put-scaling-policy \
  --auto-scaling-group-name app-asg \
  --policy-name cpu-target-tracking \
  --policy-type TargetTrackingScaling \
  --target-tracking-configuration '{
    "PredefinedMetricSpecification": {
      "PredefinedMetricType": "ASGAverageCPUUtilization"
    },
    "TargetValue": 60.0
  }'
hcl
resource "aws_launch_template" "app" {
  name          = "app-v1"
  image_id      = data.aws_ami.golden.id
  instance_type = "m7g.large"

  iam_instance_profile { name = aws_iam_instance_profile.app.name }

  metadata_options {
    http_tokens                 = "required"
    http_put_response_hop_limit = 1
  }

  user_data = base64encode(file("${path.module}/userdata.sh"))
}

resource "aws_autoscaling_group" "app" {
  name                = "app-asg"
  vpc_zone_identifier = var.private_subnet_ids
  min_size            = 2
  max_size            = 20
  desired_capacity    = 4
  health_check_type   = "ELB"

  launch_template {
    id      = aws_launch_template.app.id
    version = "$Latest"
  }

}

resource "aws_autoscaling_policy" "cpu_target" {
  name                   = "cpu-target-tracking"
  autoscaling_group_name = aws_autoscaling_group.app.name
  policy_type            = "TargetTrackingScaling"
  target_tracking_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ASGAverageCPUUtilization"
    }
    target_value = 60.0
  }
}
typescript
import * as autoscaling from 'aws-cdk-lib/aws-autoscaling';
import * as ec2 from 'aws-cdk-lib/aws-ec2';

const asg = new autoscaling.AutoScalingGroup(this, 'AppAsg', {
  vpc,
  vpcSubnets: { subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS },
  minCapacity: 2,
  maxCapacity: 20,
  desiredCapacity: 4,
  healthCheck: autoscaling.HealthCheck.elb({ grace: cdk.Duration.seconds(300) }),
  launchTemplate: new ec2.LaunchTemplate(this, 'LaunchTpl', {
    machineImage: ec2.MachineImage.lookup({ name: 'golden-app-*' }),
    instanceType: ec2.InstanceType.of(ec2.InstanceClass.M7G, ec2.InstanceSize.LARGE),
    role: appInstanceRole,
    requireImdsv2: true,
  }),
});

asg.scaleOnCpuUtilization('CpuScaling', { targetUtilizationPercent: 60 });
⚠️ Pitfall

Scaling on CPU alone for JVM apps — heap fills, GC thrashes, but CPU looks fine until OOM kill. Add custom CloudWatch metrics (request latency, queue depth, active threads) or scale on ALB RequestCountPerTarget instead of raw CPU.

ECS & Fargate

Amazon ECS runs Docker containers without you managing Kubernetes. Fargate is serverless containers — no EC2 to patch. For most Spring Boot microservice teams, ECS Fargate is the fastest path from Dockerfile to production with sane defaults.

Task definition anatomy

A task definition is the blueprint: container image, CPU/memory, port mappings, environment, secrets, logging, and IAM roles. A service runs N copies of a task definition with load balancer registration and rolling deployments.

Field Purpose
familyLogical name; each revision increments
cpu / memoryFargate requires valid pairs (e.g. 1024 CPU = 1 vCPU)
taskRoleArnRuntime permissions — S3, DynamoDB, SQS (your app)
executionRoleArnECS agent permissions — ECR pull, CloudWatch Logs, secrets injection
containerDefinitionsImage, ports, healthCheck, logConfiguration

Task role vs execution role

  • Credentials injected into the container — used by your Spring app via default credential chain
  • Least-privilege per service: s3:GetObject on one prefix, not entire account
  • Trust: ecs-tasks.amazonaws.com
  • Used by ECS/Fargate to pull image from ECR and write logs — not visible to app code
  • Needs AmazonECSTaskExecutionRolePolicy + secrets/SSM read if using secrets in task def
  • One shared execution role per account is OK; task roles must be per-service

Fargate vs EC2 launch type

Dimension Fargate EC2 launch type
Ops burden No instances to manage Patch AMIs, scale EC2 ASG, capacity providers
Cost Premium per vCPU/GB; predictable Cheaper at scale with Spot/RIs; you optimize packing
Networking Each task gets ENI (IP per task in awsvpc mode) Same awsvpc mode; density limited by ENIs per instance
When to pick Most microservices, variable load, small platform team High density, GPU, custom kernel, heavy Spot savings

Service discovery

ECS integrates with AWS Cloud Map for DNS-based discovery — orders.svc.local resolves to task IPs. Alternative: ALB for HTTP services (preferred for external traffic), App Mesh for mTLS service mesh. Cloud Map + ECS service registry auto-registers healthy tasks and deregisters on stop.

Deployment circuit breaker

Enable deploymentCircuitBreaker with rollback — if new tasks fail health checks repeatedly, ECS stops the deployment and rolls back to the last stable revision. Without it, a bad image can flap indefinitely, draining capacity.

ECS task definition + Fargate service

saved globally
bash
aws ecs register-task-definition --family order-service \
  --requires-compatibilities FARGATE --network-mode awsvpc \
  --cpu 1024 --memory 2048 \
  --task-role-arn arn:aws:iam::123456789012:role/order-service-task \
  --execution-role-arn arn:aws:iam::123456789012:role/ecsTaskExecutionRole \
  --container-definitions '[{
    "name": "app",
    "image": "123456789012.dkr.ecr.eu-west-1.amazonaws.com/order-service:latest",
    "portMappings": [{ "containerPort": 8080, "protocol": "tcp" }],
    "logConfiguration": {
      "logDriver": "awslogs",
      "options": {
        "awslogs-group": "/ecs/order-service",
        "awslogs-region": "eu-west-1",
        "awslogs-stream-prefix": "app"
      }
    }
  }]'

aws ecs create-service --cluster prod --service-name order-service \
  --task-definition order-service \
  --desired-count 3 --launch-type FARGATE \
  --network-configuration 'awsvpcConfiguration={
    subnets=[subnet-aaa,subnet-bbb],
    securityGroups=[sg-app],
    assignPublicIp=DISABLED
  }' \
  --load-balancers 'targetGroupArn=arn:aws:elasticloadbalancing:...,containerName=app,containerPort=8080' \
  --deployment-configuration 'maximumPercent=200,minimumHealthyPercent=100,deploymentCircuitBreaker={enable=true,rollback=true}'
hcl
resource "aws_ecs_task_definition" "order_service" {
  family                   = "order-service"
  requires_compatibilities = ["FARGATE"]
  network_mode             = "awsvpc"
  cpu                      = 1024
  memory                   = 2048
  task_role_arn            = aws_iam_role.task.arn
  execution_role_arn       = aws_iam_role.execution.arn

  container_definitions = jsonencode([{
    name  = "app"
    image = "${aws_ecr_repository.order_service.repository_url}:latest"
    portMappings = [{ containerPort = 8080 }]
    logConfiguration = {
      logDriver = "awslogs"
      options   = { awslogs-group = aws_cloudwatch_log_group.ecs.name }
    }
  }])
}

resource "aws_ecs_service" "order_service" {
  name            = "order-service"
  cluster         = aws_ecs_cluster.prod.id
  task_definition = aws_ecs_task_definition.order_service.arn
  desired_count   = 3
  launch_type     = "FARGATE"

  network_configuration {
    subnets          = var.private_subnet_ids
    security_groups  = [aws_security_group.app.id]
    assign_public_ip = false
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.app.arn
    container_name   = "app"
    container_port   = 8080
  }

  deployment_circuit_breaker {
    enable   = true
    rollback = true
  }
}
typescript
import * as ecs from 'aws-cdk-lib/aws-ecs';
import * as ec2 from 'aws-cdk-lib/aws-ec2';
import * as elbv2 from 'aws-cdk-lib/aws-elasticloadbalancingv2';

const taskDef = new ecs.FargateTaskDefinition(this, 'OrderTask', {
  memoryLimitMiB: 2048,
  cpu: 1024,
  taskRole,
  executionRole: ecsTaskExecutionRole,
});

const container = taskDef.addContainer('app', {
  image: ecs.ContainerImage.fromEcrRepository(orderRepo, 'latest'),
  logging: ecs.LogDrivers.awsLogs({ streamPrefix: 'app', logGroup }),
});
container.addPortMappings({ containerPort: 8080 });

const service = new ecs.FargateService(this, 'OrderService', {
  cluster,
  taskDefinition: taskDef,
  desiredCount: 3,
  vpcSubnets: { subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS },
  circuitBreaker: { rollback: true },
});
service.attachToApplicationTargetGroup(targetGroup);
📦 Real World

Figma and many fintech startups run internal services on ECS Fargate with GitHub Actions → ECR → ECS deploy. Platform teams prefer Fargate until container density economics force EC2 capacity providers — typically around hundreds of vCPUs sustained.

EKS & Kubernetes

Amazon EKS runs upstream-compatible Kubernetes. You pay for the control plane (~$0.10/hr per cluster) plus worker compute. Choose EKS when you need the Kubernetes ecosystem (Operators, Helm charts, multi-cloud portability) or already have K8s expertise — not because "Kubernetes is industry standard" alone.

Control plane vs data plane

  • Control plane (AWS managed): kube-apiserver, etcd, controller-manager, scheduler — multi-AZ, not SSH-accessible
  • Data plane (your responsibility): worker nodes running pods — managed node groups, Karpenter-provisioned EC2, or Fargate profiles
  • Workers need IAM, VPC CNI, and kubelet — you own patching unless Karpenter churns nodes automatically

Managed node groups vs Karpenter

Approach How it scales Pros Cons
Managed node groups ASG behind the scenes; you pick instance types Simple, AWS-native, predictable Slower bin-packing; manual instance type choices
Karpenter Provisioner CRD launches right-sized nodes per pending pods Fast scale-out, Spot consolidation, optimal instance selection Extra controller to operate; learning curve
EKS Fargate profiles Serverless pods — no nodes Zero node ops No DaemonSets, limited instance control, higher cost

IRSA — IAM Roles for Service Accounts

IRSA maps a Kubernetes service account to an IAM role via OIDC trust on the EKS cluster. Pods get temporary AWS credentials scoped to that role — the K8s-native equivalent of ECS task roles. Annotate the service account with eks.amazonaws.com/role-arn; use the AWS SDK default credential chain in your app — no access keys in Secrets.

yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: order-service
  namespace: prod
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/order-service-irsa

When EKS vs ECS

Choose ECS when… Choose EKS when…
Small platform team, AWS-only, Docker Compose → ECS path Existing K8s manifests, Helm charts, Operators (Prometheus, Strimzi)
Fargate-first, minimal control plane ops Multi-cloud portability requirement (same YAML on GCP/Azure)
Simpler IAM (task roles) without OIDC setup Advanced scheduling (affinity, taints, GPU sharing, service mesh at scale)
Lower control plane cost (no $0.10/hr/cluster) Large platform org with dedicated K8s SRE team
⚖️ Trade-off

EKS is not free complexity. A three-person team running 8 microservices on ECS Fargate will move slower on EKS for months while learning ingress controllers, CNI, and upgrade cadence. Adopt EKS when K8s-specific capabilities unblock the product — not for resume-driven infrastructure.

🎯 Exam Tip

IRSA is the secure pod credential pattern — not mounting instance profile creds. Pod Identity (newer) simplifies IRSA further — know both exist. EKS control plane logs go to CloudWatch — enable audit logs for compliance questions.

Lambda & serverless

Lambda runs code without provisioning servers — pay per invocation and GB-second. Perfect for event-driven glue, APIs with spiky traffic, and edge logic. Java on Lambda has historically been synonymous with cold starts — SnapStart and GraalVM native change that equation.

Cold starts — the Java problem and fixes

A cold start happens when Lambda creates a new execution environment: download layer, start JVM, run static init, then your handler. Java cold starts of 3–10 seconds were common on large Spring Boot JARs.

Mitigation How it works Trade-off
Java SnapStart Firecracker snapshot after init — restore on next invoke (Java 11+ Corretto) Not for Spring Native; limited to supported runtimes; no uniqueness in static state
GraalVM / Quarkus native AOT compile to native binary — sub-second cold starts Build complexity; reflection config; longer compile times in CI
Smaller deployment package Trim dependencies; avoid fat JAR Spring if possible May need architectural split — Lambda for thin handlers only
Provisioned concurrency Pre-warmed environments always ready Cost — you pay even when idle
🔬 Under the Hood

Lambda reuses execution environments (warm starts) when traffic is steady — same container, new invoke. SnapStart takes a memory snapshot after Init phase completes; restore skips JVM bootstrap. GraalVM native images skip JVM entirely — the binary is the handler process.

Concurrency model

  • Account concurrency limit — default 1000 per region; request increase via support
  • Reserved concurrency — guarantees capacity for a function; also caps max (no steal from pool)
  • Provisioned concurrency — pre-initialized environments; eliminates cold start for that count
  • Throttling — when concurrency exhausted, synchronous invokes return 429; async retries with backoff

For APIs behind API Gateway: set reserved concurrency on critical functions so a runaway batch job can't starve payment webhooks. Use CloudWatch alarm on ConcurrentExecutions approaching account limit.

Lambda@Edge

Lambda@Edge runs functions at CloudFront edge locations — rewrite URLs, A/B headers, JWT validation at the CDN, bot detection. Limitations: shorter timeout (5s viewer / 30s origin), smaller deployment package, no VPC. For heavy logic, use CloudFront → regional Lambda or CloudFront Functions for ultra-light transforms (< 1ms).

Lambda function with SnapStart-ready Java

saved globally
bash
aws lambda create-function \
  --function-name order-webhook-handler \
  --runtime java17 \
  --role arn:aws:iam::123456789012:role/lambda-order-handler \
  --handler com.example.OrderHandler::handleRequest \
  --code S3Bucket=artifacts,S3Key=order-handler.zip \
  --memory-size 1024 \
  --timeout 30 \
  --snap-start ApplyOn=PublishedVersions \
  --environment Variables={SPRING_PROFILES_ACTIVE=lambda}

aws lambda publish-version --function-name order-webhook-handler

aws lambda put-provisioned-concurrency-config \
  --function-name order-webhook-handler \
  --qualifier 1 \
  --provisioned-concurrent-executions 10
hcl
resource "aws_lambda_function" "order_handler" {
  function_name = "order-webhook-handler"
  role          = aws_iam_role.lambda.arn
  handler       = "com.example.OrderHandler::handleRequest"
  runtime       = "java17"
  memory_size   = 1024
  timeout       = 30
  s3_bucket     = aws_s3_bucket.artifacts.id
  s3_key        = "order-handler.zip"

  snap_start {
    apply_on = "PublishedVersions"
  }

  environment {
    variables = { SPRING_PROFILES_ACTIVE = "lambda" }
  }
}

resource "aws_lambda_provisioned_concurrency_config" "order_handler" {
  function_name                     = aws_lambda_function.order_handler.function_name
  qualifier                         = aws_lambda_function.order_handler.version
  provisioned_concurrent_executions = 10
}
typescript
import * as lambda from 'aws-cdk-lib/aws-lambda';
import * as s3 from 'aws-cdk-lib/aws-s3';

const fn = new lambda.Function(this, 'OrderHandler', {
  runtime: lambda.Runtime.JAVA_17,
  handler: 'com.example.OrderHandler::handleRequest',
  code: lambda.Code.fromBucket(s3.Bucket.fromBucketName(this, 'Artifacts', 'artifacts'), 'order-handler.zip'),
  memorySize: 1024,
  timeout: cdk.Duration.seconds(30),
  role: lambdaRole,
  snapStart: lambda.SnapStartConf.ON_PUBLISHED_VERSIONS,
  environment: { SPRING_PROFILES_ACTIVE: 'lambda' },
});

const version = fn.currentVersion;
new lambda.Alias(this, 'Live', {
  aliasName: 'live',
  version,
  provisionedConcurrentExecutions: 10,
});
⚠️ Pitfall

Deploying full Spring Boot MVC to Lambda without SnapStart or native compile — API Gateway timeouts before the first response. Split: API on ECS/EKS, async events to Lambda handlers, or use Quarkus/Micronaut with GraalVM native for sub-second cold starts.

Compute selection matrix

No single compute service wins every workload. Use this matrix in architecture reviews and interviews — the right answer always starts with workload shape, team skills, and operational budget.

Workload signal EC2 / ASG ECS Fargate EKS Lambda
Always-on HTTP API (Spring Boot) ✓ Classic; full control ✓✓ Sweet spot ✓ If K8s already standard △ Spiky only; cold start risk
Long-running batch / GPU ✓✓ Spot + ASG △ 120-min task limit ✓ Jobs/CronJob + Karpenter ✗ 15-min max timeout
Event-driven (S3, SQS, EventBridge) △ Worker ASG polling ✓ Container workers ✓ K8s consumers ✓✓ Native fit
Traffic pattern Steady or predictable Variable microservices Complex scheduling needs Sporadic / bursty
Ops team size Needs EC2/AMI expertise Minimal — AWS manages nodes Needs K8s SRE capacity Minimal — function-level
Startup / scale speed Minutes (AMI boot) ~60s task start Minutes (node + pod) Seconds (warm) / cold start risk
Cost at low traffic △ Min ASG size cost △ Per-task hourly △ Control plane + nodes ✓✓ Pay per invoke
Cost at high sustained load ✓✓ RI/Spot optimized ✓ Good mid-scale ✓✓ Density + Spot ✗ Expensive at volume
⚖️ Trade-off

ECS Fargate vs EKS: Fargate wins on time-to-production and operational simplicity. EKS wins when you need Kubernetes-specific tooling or multi-cloud portability. Lambda vs containers: Lambda wins below ~steady 100 req/s for simple handlers; containers win for long connections, WebSockets, and complex JVM apps. EC2 vs everything: EC2 remains correct for maximum control, licensing, and cost optimization at scale — but you own patching, AMIs, and capacity planning.

💡 Pro Tip

In system design interviews, state your assumptions first: QPS, p99 latency, team size, and burst factor. Then pick one primary compute and one fallback — e.g. "ECS Fargate for the API, Lambda for async webhooks." Interviewers reward explicit trade-offs over "we'd use Kubernetes because it's modern."

📦 Real World

Amazon.com internal services use a mix of EC2, ECS, EKS, and Lambda — no single compute winner. Monzo ran core banking on EC2/K8s early, then adopted Lambda for event pipelines. Pattern: start simple (Fargate or Lambda), split when metrics prove a boundary.