Compute: EC2, ECS, EKS & Lambda
AWS offers four primary compute models — virtual machines (EC2), container orchestration (ECS and EKS), and functions (Lambda). Most production platforms use more than one: a Spring Boot API on ECS Fargate, batch jobs on Spot EC2, edge routing via Lambda@Edge, and a data pipeline on managed EKS node groups. This chapter covers how to pick instance types, scale safely, wire IAM roles correctly, and avoid the cold-start and cost traps that show up in real interviews and real incidents.
EC2 deep dive
EC2 is still the foundation of AWS compute — even when you run containers or Lambda, something underneath is often an EC2 instance. Understanding instance families, storage, and bootstrap patterns prevents over-provisioning and security gaps like open IMDSv1 metadata endpoints.
Instance families — the letter tells you the workload
AWS names instance types as family.size — e.g. m7g.large. The family letter is the primary sizing signal; the number is the generation (higher = newer, usually better price/performance).
| Family | Optimized for | Examples | Typical use |
|---|---|---|---|
| t (burstable) | Baseline CPU with burst credits | t4g.micro, t3.medium | Dev/staging, low-traffic APIs, bastion hosts — not sustained CPU |
| m (general) | Balanced CPU, memory, network | m7i.large, m7g.xlarge | Spring Boot services, app servers, small databases |
| c (compute) | High CPU ratio | c7i.2xlarge, c7g.4xlarge | Batch processing, transcode, high-throughput APIs |
| r (memory) | High RAM ratio | r7i.xlarge, r6g.2xlarge | In-memory caches, JVM heaps, analytics |
| i (storage I/O) | High local NVMe IOPS | i4i.xlarge, i3en.2xlarge | NoSQL, time-series DBs, high-write workloads |
| p (GPU) | NVIDIA GPUs | p4d.24xlarge, g5.xlarge | ML training/inference, rendering |
| inf (Inferentia) | AWS ML inference chips | inf2.xlarge | Cost-efficient model serving at scale |
Graviton (ARM) for Java and Spring
Graviton instances (*g suffix — e.g. m7g, c7g, r7g) use AWS-designed ARM Neoverse cores. For Java/Spring workloads on Amazon Corretto 17+ or GraalVM native images, Graviton often delivers 20–40% better price/performance vs equivalent x86 (*i Intel, *a AMD).
Use multi-arch images, verify JNI/native deps, and load-test JVM GC on ARM before prod cutover. Graviton Spot and Savings Plans stack well for batch workloads.
Start new Spring Boot projects on m7g.large in dev. If p99 latency and GC pause metrics match x86 baselines after a week of load testing, promote Graviton to staging and prod — don't assume ARM incompatibility without measuring.
Purchasing options
| Option | Commitment | Discount | Best for |
|---|---|---|---|
| On-Demand | None | Baseline price | Spiky/unpredictable load, short-lived environments, prod baseline you can't interrupt |
| Reserved Instances (RI) | 1 or 3 years; specific instance family/region | Up to ~72% vs On-Demand | Steady-state baseline — RDS-style always-on app servers |
| Savings Plans | $/hr commit (Compute or EC2 Instance SP) | Similar to RI; more flexible | Mixed instance types/regions — preferred over RI for most teams now |
| Spot | None; can be interrupted with 2-min notice | Up to ~90% off | Fault-tolerant batch, CI runners, Karpenter/EKS, ASG with mixed instances |
| Dedicated Hosts / Instances | Physical server isolation | Premium pricing | License compliance (BYOL), regulatory isolation — rarely needed otherwise |
Savings Plans beat On-Demand immediately for any baseline fleet running 24/7. Layer Spot on top for interruptible capacity — never run stateful primary databases on Spot. Use aws compute-optimizer and Cost Explorer "Savings Plans recommendations" before committing to a 3-year term.
EBS volume types
| Type | Use case | IOPS / throughput | Notes |
|---|---|---|---|
| gp3 | Default for most workloads | 3,000 IOPS / 125 MB/s baseline; independently scalable | Replace gp2 — cheaper, decouple IOPS from size |
| io2 | Mission-critical databases | Up to 256,000 IOPS; 99.999% durability (io2 Block Express) | Provisioned IOPS; pay for what you need |
| st1 | Throughput-heavy sequential reads | Low cost per GB; HDD-backed | Big data, log processing — not boot volumes |
| sc1 | Cold throughput storage | Lowest $/GB HDD | Infrequently accessed bulk data |
AMI golden image workflow
A golden AMI is a hardened, tested base image baked by CI — not manual console clicks. Pipeline: Packer/Ansible builds image → vulnerability scan → register AMI → launch template references latest approved version tag. App deployments swap AMIs or launch template versions, not SSH into running instances.
Pipeline: Packer/Ansible on AL2023 → CIS hardening → IMDSv2-only → register with Approved=true tag → launch template references latest approved version. Expire stale AMIs via DLM after 90 days.
UserData runs once at first boot via cloud-init — agent install, volume mount, cluster join. Never embed secrets; fetch from SSM or Secrets Manager. Keep scripts idempotent and log to CloudWatch.
IMDSv2 — metadata service hardening
The Instance Metadata Service (IMDS) at 169.254.169.254 exposes instance identity and IAM role credentials. IMDSv1 uses simple HTTP GET — vulnerable to SSRF attacks that steal credentials. IMDSv2 requires a session token via PUT first.
- Enforce HttpTokens: required in launch templates (IMDSv2-only)
- Set HttpPutResponseHopLimit: 1 for containers unless you explicitly need hop > 1
- Hop limit > 1 needed for Docker on EC2 to reach IMDS from containers — prefer task roles on ECS instead
Capital One's 2019 breach exploited SSRF → IMDSv1 → IAM credentials. Account-level setting: require IMDSv2 on all new instances. Audit with Config rule ec2-imdsv2-check and remediate non-compliant launch templates.
gp3 vs gp2: gp3 lets you provision IOPS independently of volume size — exam favorite. Spot interruption: 2-minute warning via instance metadata and EventBridge — design for graceful shutdown. Placement groups: cluster (low latency HPC), spread (max isolation), partition (large distributed systems).
Auto Scaling Groups
An Auto Scaling Group (ASG) maintains a desired count of EC2 instances across Availability Zones. Launch templates (replacing launch configurations) define what gets launched; scaling policies decide when. Production ASGs always pair with health checks, lifecycle hooks, and scale-in protection for stateful nodes.
ASG core concepts
| Setting | Purpose | Production guidance |
|---|---|---|
| Min / Desired / Max | Capacity bounds | Min ≥ 2 across AZs for HA; max caps runaway scaling bills |
| Launch template | AMI, instance type, SG, user data, IMDS config | Version every change; use $Latest or explicit version in ASG |
| Health check | EC2 status vs ELB target health | Use ELB health for app-aware replacement — EC2-only misses app failures |
| Warm pool | Pre-initialized stopped instances | Faster scale-out for slow-boot JVM apps |
Scaling policy types
| Policy | Trigger | When to use |
|---|---|---|
| Target tracking | Maintain metric at target (e.g. CPU 60%) | Default choice — simplest, self-tuning |
| Step scaling | Metric thresholds → add/remove N instances | Non-linear response — aggressive scale-out, conservative scale-in |
| Scheduled | Cron-like min/desired changes | Known traffic patterns — Black Friday, business hours batch |
| Predictive scaling | ML forecast from historical metrics | Regular daily/weekly cycles — pre-warm before traffic spike |
Target tracking uses a proportional-integral controller — it doesn't just react to current CPU but anticipates drift from the target. Scale-in has a default 300-second cooldown to prevent flapping. Lifecycle hooks pause instance launch/terminate so your app can drain connections before the ALB deregistration completes.
Lifecycle hooks
Hooks fire on autoscaling:EC2_INSTANCE_LAUNCHING and EC2_INSTANCE_TERMINATING. While in Pending:Wait or Terminating:Wait, the ASG waits (up to heartbeat timeout, default 3600s) for a Lambda, SQS, or manual complete-lifecycle-action call.
- Launch hook: run config management, register with service mesh, warm JVM before traffic
- Terminate hook: drain queue, flush logs, deregister from Consul/Eureka
- Always set heartbeat timeout < your max drain time + buffer
Scale-in protection
SetInstanceProtection marks instances as protected from scale-in (not from manual termination or Spot interruption). Use for long-running batch jobs on shared ASGs — e.g. a 6-hour ETL worker shouldn't disappear mid-run because CPU dropped fleet-wide.
Launch template with IMDSv2 and mixed instances
aws ec2 create-launch-template --launch-template-name app-v1 \
--launch-template-data '{
"ImageId": "ami-0abcdef1234567890",
"InstanceType": "m7g.large",
"IamInstanceProfile": { "Name": "app-ec2-role" },
"MetadataOptions": {
"HttpTokens": "required",
"HttpPutResponseHopLimit": 1,
"HttpEndpoint": "enabled"
},
"UserData": "'$(echo '#!/bin/bash' | base64)'"
}'
aws autoscaling create-auto-scaling-group \
--auto-scaling-group-name app-asg \
--launch-template LaunchTemplateName=app-v1,Version='$Latest' \
--min-size 2 --max-size 20 --desired-capacity 4 \
--vpc-zone-identifier "subnet-aaa,subnet-bbb" \
--health-check-type ELB --health-check-grace-period 300
aws autoscaling put-scaling-policy \
--auto-scaling-group-name app-asg \
--policy-name cpu-target-tracking \
--policy-type TargetTrackingScaling \
--target-tracking-configuration '{
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ASGAverageCPUUtilization"
},
"TargetValue": 60.0
}'
resource "aws_launch_template" "app" {
name = "app-v1"
image_id = data.aws_ami.golden.id
instance_type = "m7g.large"
iam_instance_profile { name = aws_iam_instance_profile.app.name }
metadata_options {
http_tokens = "required"
http_put_response_hop_limit = 1
}
user_data = base64encode(file("${path.module}/userdata.sh"))
}
resource "aws_autoscaling_group" "app" {
name = "app-asg"
vpc_zone_identifier = var.private_subnet_ids
min_size = 2
max_size = 20
desired_capacity = 4
health_check_type = "ELB"
launch_template {
id = aws_launch_template.app.id
version = "$Latest"
}
}
resource "aws_autoscaling_policy" "cpu_target" {
name = "cpu-target-tracking"
autoscaling_group_name = aws_autoscaling_group.app.name
policy_type = "TargetTrackingScaling"
target_tracking_configuration {
predefined_metric_specification {
predefined_metric_type = "ASGAverageCPUUtilization"
}
target_value = 60.0
}
}
import * as autoscaling from 'aws-cdk-lib/aws-autoscaling';
import * as ec2 from 'aws-cdk-lib/aws-ec2';
const asg = new autoscaling.AutoScalingGroup(this, 'AppAsg', {
vpc,
vpcSubnets: { subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS },
minCapacity: 2,
maxCapacity: 20,
desiredCapacity: 4,
healthCheck: autoscaling.HealthCheck.elb({ grace: cdk.Duration.seconds(300) }),
launchTemplate: new ec2.LaunchTemplate(this, 'LaunchTpl', {
machineImage: ec2.MachineImage.lookup({ name: 'golden-app-*' }),
instanceType: ec2.InstanceType.of(ec2.InstanceClass.M7G, ec2.InstanceSize.LARGE),
role: appInstanceRole,
requireImdsv2: true,
}),
});
asg.scaleOnCpuUtilization('CpuScaling', { targetUtilizationPercent: 60 });
Scaling on CPU alone for JVM apps — heap fills, GC thrashes, but CPU looks fine until OOM kill. Add custom CloudWatch metrics (request latency, queue depth, active threads) or scale on ALB RequestCountPerTarget instead of raw CPU.
ECS & Fargate
Amazon ECS runs Docker containers without you managing Kubernetes. Fargate is serverless containers — no EC2 to patch. For most Spring Boot microservice teams, ECS Fargate is the fastest path from Dockerfile to production with sane defaults.
Task definition anatomy
A task definition is the blueprint: container image, CPU/memory, port mappings, environment, secrets, logging, and IAM roles. A service runs N copies of a task definition with load balancer registration and rolling deployments.
| Field | Purpose |
|---|---|
| family | Logical name; each revision increments |
| cpu / memory | Fargate requires valid pairs (e.g. 1024 CPU = 1 vCPU) |
| taskRoleArn | Runtime permissions — S3, DynamoDB, SQS (your app) |
| executionRoleArn | ECS agent permissions — ECR pull, CloudWatch Logs, secrets injection |
| containerDefinitions | Image, ports, healthCheck, logConfiguration |
Task role vs execution role
- Credentials injected into the container — used by your Spring app via default credential chain
- Least-privilege per service: s3:GetObject on one prefix, not entire account
- Trust: ecs-tasks.amazonaws.com
- Used by ECS/Fargate to pull image from ECR and write logs — not visible to app code
- Needs AmazonECSTaskExecutionRolePolicy + secrets/SSM read if using secrets in task def
- One shared execution role per account is OK; task roles must be per-service
Fargate vs EC2 launch type
| Dimension | Fargate | EC2 launch type |
|---|---|---|
| Ops burden | No instances to manage | Patch AMIs, scale EC2 ASG, capacity providers |
| Cost | Premium per vCPU/GB; predictable | Cheaper at scale with Spot/RIs; you optimize packing |
| Networking | Each task gets ENI (IP per task in awsvpc mode) | Same awsvpc mode; density limited by ENIs per instance |
| When to pick | Most microservices, variable load, small platform team | High density, GPU, custom kernel, heavy Spot savings |
Service discovery
ECS integrates with AWS Cloud Map for DNS-based discovery — orders.svc.local resolves to task IPs. Alternative: ALB for HTTP services (preferred for external traffic), App Mesh for mTLS service mesh. Cloud Map + ECS service registry auto-registers healthy tasks and deregisters on stop.
Deployment circuit breaker
Enable deploymentCircuitBreaker with rollback — if new tasks fail health checks repeatedly, ECS stops the deployment and rolls back to the last stable revision. Without it, a bad image can flap indefinitely, draining capacity.
ECS task definition + Fargate service
aws ecs register-task-definition --family order-service \
--requires-compatibilities FARGATE --network-mode awsvpc \
--cpu 1024 --memory 2048 \
--task-role-arn arn:aws:iam::123456789012:role/order-service-task \
--execution-role-arn arn:aws:iam::123456789012:role/ecsTaskExecutionRole \
--container-definitions '[{
"name": "app",
"image": "123456789012.dkr.ecr.eu-west-1.amazonaws.com/order-service:latest",
"portMappings": [{ "containerPort": 8080, "protocol": "tcp" }],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/order-service",
"awslogs-region": "eu-west-1",
"awslogs-stream-prefix": "app"
}
}
}]'
aws ecs create-service --cluster prod --service-name order-service \
--task-definition order-service \
--desired-count 3 --launch-type FARGATE \
--network-configuration 'awsvpcConfiguration={
subnets=[subnet-aaa,subnet-bbb],
securityGroups=[sg-app],
assignPublicIp=DISABLED
}' \
--load-balancers 'targetGroupArn=arn:aws:elasticloadbalancing:...,containerName=app,containerPort=8080' \
--deployment-configuration 'maximumPercent=200,minimumHealthyPercent=100,deploymentCircuitBreaker={enable=true,rollback=true}'
resource "aws_ecs_task_definition" "order_service" {
family = "order-service"
requires_compatibilities = ["FARGATE"]
network_mode = "awsvpc"
cpu = 1024
memory = 2048
task_role_arn = aws_iam_role.task.arn
execution_role_arn = aws_iam_role.execution.arn
container_definitions = jsonencode([{
name = "app"
image = "${aws_ecr_repository.order_service.repository_url}:latest"
portMappings = [{ containerPort = 8080 }]
logConfiguration = {
logDriver = "awslogs"
options = { awslogs-group = aws_cloudwatch_log_group.ecs.name }
}
}])
}
resource "aws_ecs_service" "order_service" {
name = "order-service"
cluster = aws_ecs_cluster.prod.id
task_definition = aws_ecs_task_definition.order_service.arn
desired_count = 3
launch_type = "FARGATE"
network_configuration {
subnets = var.private_subnet_ids
security_groups = [aws_security_group.app.id]
assign_public_ip = false
}
load_balancer {
target_group_arn = aws_lb_target_group.app.arn
container_name = "app"
container_port = 8080
}
deployment_circuit_breaker {
enable = true
rollback = true
}
}
import * as ecs from 'aws-cdk-lib/aws-ecs';
import * as ec2 from 'aws-cdk-lib/aws-ec2';
import * as elbv2 from 'aws-cdk-lib/aws-elasticloadbalancingv2';
const taskDef = new ecs.FargateTaskDefinition(this, 'OrderTask', {
memoryLimitMiB: 2048,
cpu: 1024,
taskRole,
executionRole: ecsTaskExecutionRole,
});
const container = taskDef.addContainer('app', {
image: ecs.ContainerImage.fromEcrRepository(orderRepo, 'latest'),
logging: ecs.LogDrivers.awsLogs({ streamPrefix: 'app', logGroup }),
});
container.addPortMappings({ containerPort: 8080 });
const service = new ecs.FargateService(this, 'OrderService', {
cluster,
taskDefinition: taskDef,
desiredCount: 3,
vpcSubnets: { subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS },
circuitBreaker: { rollback: true },
});
service.attachToApplicationTargetGroup(targetGroup);
Figma and many fintech startups run internal services on ECS Fargate with GitHub Actions → ECR → ECS deploy. Platform teams prefer Fargate until container density economics force EC2 capacity providers — typically around hundreds of vCPUs sustained.
EKS & Kubernetes
Amazon EKS runs upstream-compatible Kubernetes. You pay for the control plane (~$0.10/hr per cluster) plus worker compute. Choose EKS when you need the Kubernetes ecosystem (Operators, Helm charts, multi-cloud portability) or already have K8s expertise — not because "Kubernetes is industry standard" alone.
Control plane vs data plane
- Control plane (AWS managed): kube-apiserver, etcd, controller-manager, scheduler — multi-AZ, not SSH-accessible
- Data plane (your responsibility): worker nodes running pods — managed node groups, Karpenter-provisioned EC2, or Fargate profiles
- Workers need IAM, VPC CNI, and kubelet — you own patching unless Karpenter churns nodes automatically
Managed node groups vs Karpenter
| Approach | How it scales | Pros | Cons |
|---|---|---|---|
| Managed node groups | ASG behind the scenes; you pick instance types | Simple, AWS-native, predictable | Slower bin-packing; manual instance type choices |
| Karpenter | Provisioner CRD launches right-sized nodes per pending pods | Fast scale-out, Spot consolidation, optimal instance selection | Extra controller to operate; learning curve |
| EKS Fargate profiles | Serverless pods — no nodes | Zero node ops | No DaemonSets, limited instance control, higher cost |
IRSA — IAM Roles for Service Accounts
IRSA maps a Kubernetes service account to an IAM role via OIDC trust on the EKS cluster. Pods get temporary AWS credentials scoped to that role — the K8s-native equivalent of ECS task roles. Annotate the service account with eks.amazonaws.com/role-arn; use the AWS SDK default credential chain in your app — no access keys in Secrets.
apiVersion: v1
kind: ServiceAccount
metadata:
name: order-service
namespace: prod
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/order-service-irsa
When EKS vs ECS
| Choose ECS when… | Choose EKS when… |
|---|---|
| Small platform team, AWS-only, Docker Compose → ECS path | Existing K8s manifests, Helm charts, Operators (Prometheus, Strimzi) |
| Fargate-first, minimal control plane ops | Multi-cloud portability requirement (same YAML on GCP/Azure) |
| Simpler IAM (task roles) without OIDC setup | Advanced scheduling (affinity, taints, GPU sharing, service mesh at scale) |
| Lower control plane cost (no $0.10/hr/cluster) | Large platform org with dedicated K8s SRE team |
EKS is not free complexity. A three-person team running 8 microservices on ECS Fargate will move slower on EKS for months while learning ingress controllers, CNI, and upgrade cadence. Adopt EKS when K8s-specific capabilities unblock the product — not for resume-driven infrastructure.
IRSA is the secure pod credential pattern — not mounting instance profile creds. Pod Identity (newer) simplifies IRSA further — know both exist. EKS control plane logs go to CloudWatch — enable audit logs for compliance questions.
Lambda & serverless
Lambda runs code without provisioning servers — pay per invocation and GB-second. Perfect for event-driven glue, APIs with spiky traffic, and edge logic. Java on Lambda has historically been synonymous with cold starts — SnapStart and GraalVM native change that equation.
Cold starts — the Java problem and fixes
A cold start happens when Lambda creates a new execution environment: download layer, start JVM, run static init, then your handler. Java cold starts of 3–10 seconds were common on large Spring Boot JARs.
| Mitigation | How it works | Trade-off |
|---|---|---|
| Java SnapStart | Firecracker snapshot after init — restore on next invoke (Java 11+ Corretto) | Not for Spring Native; limited to supported runtimes; no uniqueness in static state |
| GraalVM / Quarkus native | AOT compile to native binary — sub-second cold starts | Build complexity; reflection config; longer compile times in CI |
| Smaller deployment package | Trim dependencies; avoid fat JAR Spring if possible | May need architectural split — Lambda for thin handlers only |
| Provisioned concurrency | Pre-warmed environments always ready | Cost — you pay even when idle |
Lambda reuses execution environments (warm starts) when traffic is steady — same container, new invoke. SnapStart takes a memory snapshot after Init phase completes; restore skips JVM bootstrap. GraalVM native images skip JVM entirely — the binary is the handler process.
Concurrency model
- Account concurrency limit — default 1000 per region; request increase via support
- Reserved concurrency — guarantees capacity for a function; also caps max (no steal from pool)
- Provisioned concurrency — pre-initialized environments; eliminates cold start for that count
- Throttling — when concurrency exhausted, synchronous invokes return 429; async retries with backoff
For APIs behind API Gateway: set reserved concurrency on critical functions so a runaway batch job can't starve payment webhooks. Use CloudWatch alarm on ConcurrentExecutions approaching account limit.
Lambda@Edge
Lambda@Edge runs functions at CloudFront edge locations — rewrite URLs, A/B headers, JWT validation at the CDN, bot detection. Limitations: shorter timeout (5s viewer / 30s origin), smaller deployment package, no VPC. For heavy logic, use CloudFront → regional Lambda or CloudFront Functions for ultra-light transforms (< 1ms).
Lambda function with SnapStart-ready Java
aws lambda create-function \
--function-name order-webhook-handler \
--runtime java17 \
--role arn:aws:iam::123456789012:role/lambda-order-handler \
--handler com.example.OrderHandler::handleRequest \
--code S3Bucket=artifacts,S3Key=order-handler.zip \
--memory-size 1024 \
--timeout 30 \
--snap-start ApplyOn=PublishedVersions \
--environment Variables={SPRING_PROFILES_ACTIVE=lambda}
aws lambda publish-version --function-name order-webhook-handler
aws lambda put-provisioned-concurrency-config \
--function-name order-webhook-handler \
--qualifier 1 \
--provisioned-concurrent-executions 10
resource "aws_lambda_function" "order_handler" {
function_name = "order-webhook-handler"
role = aws_iam_role.lambda.arn
handler = "com.example.OrderHandler::handleRequest"
runtime = "java17"
memory_size = 1024
timeout = 30
s3_bucket = aws_s3_bucket.artifacts.id
s3_key = "order-handler.zip"
snap_start {
apply_on = "PublishedVersions"
}
environment {
variables = { SPRING_PROFILES_ACTIVE = "lambda" }
}
}
resource "aws_lambda_provisioned_concurrency_config" "order_handler" {
function_name = aws_lambda_function.order_handler.function_name
qualifier = aws_lambda_function.order_handler.version
provisioned_concurrent_executions = 10
}
import * as lambda from 'aws-cdk-lib/aws-lambda';
import * as s3 from 'aws-cdk-lib/aws-s3';
const fn = new lambda.Function(this, 'OrderHandler', {
runtime: lambda.Runtime.JAVA_17,
handler: 'com.example.OrderHandler::handleRequest',
code: lambda.Code.fromBucket(s3.Bucket.fromBucketName(this, 'Artifacts', 'artifacts'), 'order-handler.zip'),
memorySize: 1024,
timeout: cdk.Duration.seconds(30),
role: lambdaRole,
snapStart: lambda.SnapStartConf.ON_PUBLISHED_VERSIONS,
environment: { SPRING_PROFILES_ACTIVE: 'lambda' },
});
const version = fn.currentVersion;
new lambda.Alias(this, 'Live', {
aliasName: 'live',
version,
provisionedConcurrentExecutions: 10,
});
Deploying full Spring Boot MVC to Lambda without SnapStart or native compile — API Gateway timeouts before the first response. Split: API on ECS/EKS, async events to Lambda handlers, or use Quarkus/Micronaut with GraalVM native for sub-second cold starts.
Compute selection matrix
No single compute service wins every workload. Use this matrix in architecture reviews and interviews — the right answer always starts with workload shape, team skills, and operational budget.
| Workload signal | EC2 / ASG | ECS Fargate | EKS | Lambda |
|---|---|---|---|---|
| Always-on HTTP API (Spring Boot) | ✓ Classic; full control | ✓✓ Sweet spot | ✓ If K8s already standard | △ Spiky only; cold start risk |
| Long-running batch / GPU | ✓✓ Spot + ASG | △ 120-min task limit | ✓ Jobs/CronJob + Karpenter | ✗ 15-min max timeout |
| Event-driven (S3, SQS, EventBridge) | △ Worker ASG polling | ✓ Container workers | ✓ K8s consumers | ✓✓ Native fit |
| Traffic pattern | Steady or predictable | Variable microservices | Complex scheduling needs | Sporadic / bursty |
| Ops team size | Needs EC2/AMI expertise | Minimal — AWS manages nodes | Needs K8s SRE capacity | Minimal — function-level |
| Startup / scale speed | Minutes (AMI boot) | ~60s task start | Minutes (node + pod) | Seconds (warm) / cold start risk |
| Cost at low traffic | △ Min ASG size cost | △ Per-task hourly | △ Control plane + nodes | ✓✓ Pay per invoke |
| Cost at high sustained load | ✓✓ RI/Spot optimized | ✓ Good mid-scale | ✓✓ Density + Spot | ✗ Expensive at volume |
ECS Fargate vs EKS: Fargate wins on time-to-production and operational simplicity. EKS wins when you need Kubernetes-specific tooling or multi-cloud portability. Lambda vs containers: Lambda wins below ~steady 100 req/s for simple handlers; containers win for long connections, WebSockets, and complex JVM apps. EC2 vs everything: EC2 remains correct for maximum control, licensing, and cost optimization at scale — but you own patching, AMIs, and capacity planning.
In system design interviews, state your assumptions first: QPS, p99 latency, team size, and burst factor. Then pick one primary compute and one fallback — e.g. "ECS Fargate for the API, Lambda for async webhooks." Interviewers reward explicit trade-offs over "we'd use Kubernetes because it's modern."
Amazon.com internal services use a mix of EC2, ECS, EKS, and Lambda — no single compute winner. Monzo ran core banking on EC2/K8s early, then adopted Lambda for event pipelines. Pattern: start simple (Fargate or Lambda), split when metrics prove a boundary.