Storage: S3, EBS & EFS
AWS storage is three different animals: S3 is object storage for files, backups, static assets, and data lakes; EBS is block storage attached to EC2 like a virtual disk; EFS is shared NFS for multiple instances. Pick wrong and you pay 10× for the wrong tier, lose data on instance termination, or expose a bucket to the internet. This chapter covers the object model, storage classes, encryption, lifecycle, replication, and when to use each service in production.
S3 deep dive
Amazon S3 stores objects (files + metadata) in buckets. There is no directory hierarchy — only flat keys that look like paths. Understanding the object model, storage classes, and consistency model is the foundation for everything else: backups, static hosting, data lakes, and cross-region DR.
The object model
Every S3 object has a key (e.g. uploads/2024/invoice.pdf), a value (0 bytes to 5 TB), and metadata (system + user-defined). Buckets are globally unique names in the s3.amazonaws.com namespace — choose names carefully; you cannot rename a bucket without creating a new one and copying objects.
| Concept | Details | Production note |
|---|---|---|
| Bucket | Container in a region; name is global | One bucket per environment per purpose — avoid mega-buckets with mixed sensitivity |
| Key / prefix | Flat namespace; / is a convention, not a folder | Design prefixes for lifecycle rules, IAM conditions, and S3 Inventory reports |
| Version ID | Present when versioning enabled; null for unversioned | Enable on production buckets — protects against accidental overwrite and ransomware |
| Storage class | Per-object tier — Standard, IA, Glacier, etc. | Set via upload or lifecycle transition; wrong class = wrong cost profile |
| Object Lock | WORM retention — governance or compliance mode | Requires versioning; bucket must be created with Object Lock enabled |
S3 is strongly consistent for all operations — after a successful PUT, subsequent GETs immediately return the new object. This changed in December 2020 (previously read-after-write was eventually consistent for overwrites and deletes in some cases). For exam purposes: S3 is strongly consistent today.
Storage classes — comparison
Storage class determines durability, availability, minimum storage duration, retrieval fees, and access latency. Match the class to access pattern — not every object belongs in Standard.
| Storage class | Use case | Min duration | Retrieval | Availability |
|---|---|---|---|---|
| S3 Standard | Hot data — frequent access, low latency | None | Instant, no fee | 99.99% |
| S3 Standard-IA | Infrequent access — backups, DR copies | 30 days | Instant + per-GB fee | 99.9% |
| S3 One Zone-IA | Recreatable infrequent data — lower cost, single AZ | 30 days | Instant + per-GB fee | 99.5% |
| S3 Intelligent-Tiering | Unknown or changing access patterns — auto-moves tiers | None (small monitoring fee) | Instant (no retrieval fee in Frequent/Infrequent tiers) | 99.9% |
| S3 Glacier Instant Retrieval | Archive with millisecond access — quarterly access OK | 90 days | Instant + per-GB fee | 99.9% |
| S3 Glacier Flexible Retrieval | Archive — minutes to hours retrieval (formerly Glacier) | 90 days | Expedited (1–5 min), Standard (3–5 hr), Bulk (5–12 hr) | 99.99% (after restore) |
| S3 Glacier Deep Archive | Long-term compliance — annual access or less | 180 days | Standard (12 hr), Bulk (48 hr) | 99.99% (after restore) |
Standard-IA and Glacier tiers charge retrieval fees — a "cheap" archive bucket that gets read daily will cost more than Standard. One Zone-IA saves ~20% vs Standard-IA but loses cross-AZ redundancy; only use for data you can rebuild. Intelligent-Tiering adds a small monitoring fee per object but eliminates manual tier management — good default for mixed workloads with objects > 128 KB.
Lifecycle rules
Lifecycle configurations automatically transition objects between storage classes or expire (delete) them based on age, prefix, tags, or current storage class. Rules run once per day — not real-time.
- Transition — move to IA after 30 days, Glacier after 90 days, Deep Archive after 365 days
- Expiration — delete objects or noncurrent versions after N days
- Abort incomplete multipart uploads — after 7 days to stop paying for orphaned parts
- Filter — apply rules to prefix (logs/) or object tags
Versioning
With versioning enabled, every PUT creates a new version; DELETE adds a delete marker (doesn't remove data). Restore by deleting the delete marker or copying a previous version. Pair with lifecycle rules to expire noncurrent versions after N days — otherwise storage grows forever.
Replication — CRR and SRR
S3 Replication copies objects from a source bucket to a destination bucket automatically. Requires versioning on both buckets. IAM role must allow replication actions.
| Type | Scope | Typical use |
|---|---|---|
| CRR (Cross-Region) | Source region → different region | DR, lower latency for global users, compliance residency copy |
| SRR (Same-Region) | Source → bucket in same region | Aggregate logs, separate prod/analytics copies, compliance isolation |
| RTC (Replication Time Control) | CRR with 15-minute SLA | Regulated DR with predictable RPO; additional cost |
CRR copies across regions (DR, compliance); SRR within the same region (log aggregation). Both require versioning on source and destination. RTC adds a 15-minute replication SLA.
Production bucket — encryption + lifecycle
aws s3api create-bucket --bucket my-app-artifacts-prod-eu \
--region eu-west-1 \
--create-bucket-configuration LocationConstraint=eu-west-1
aws s3api put-bucket-versioning --bucket my-app-artifacts-prod-eu \
--versioning-configuration Status=Enabled
aws s3api put-bucket-encryption --bucket my-app-artifacts-prod-eu \
--server-side-encryption-configuration '{
"Rules": [{
"ApplyServerSideEncryptionByDefault": {
"SSEAlgorithm": "aws:kms",
"KMSMasterKeyID": "alias/my-app-s3-key"
},
"BucketKeyEnabled": true
}]
}'
aws s3api put-public-access-block --bucket my-app-artifacts-prod-eu \
--public-access-block-configuration \
BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true
cat > /tmp/lifecycle.json <<'EOF'
{
"Rules": [{
"ID": "archive-and-expire",
"Status": "Enabled",
"Filter": { "Prefix": "uploads/" },
"Transitions": [
{ "Days": 30, "StorageClass": "STANDARD_IA" },
{ "Days": 90, "StorageClass": "GLACIER" }
],
"NoncurrentVersionExpiration": { "NoncurrentDays": 30 },
"AbortIncompleteMultipartUpload": { "DaysAfterInitiation": 7 }
}]
}
EOF
aws s3api put-bucket-lifecycle-configuration --bucket my-app-artifacts-prod-eu \
--lifecycle-configuration file:///tmp/lifecycle.json
resource "aws_s3_bucket" "artifacts" {
bucket = "my-app-artifacts-prod-eu"
}
resource "aws_s3_bucket_versioning" "artifacts" {
bucket = aws_s3_bucket.artifacts.id
versioning_configuration { status = "Enabled" }
}
resource "aws_s3_bucket_server_side_encryption_configuration" "artifacts" {
bucket = aws_s3_bucket.artifacts.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "aws:kms"
kms_master_key_id = aws_kms_key.s3.arn
}
bucket_key_enabled = true
}
}
resource "aws_s3_bucket_public_access_block" "artifacts" {
bucket = aws_s3_bucket.artifacts.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
resource "aws_s3_bucket_lifecycle_configuration" "artifacts" {
bucket = aws_s3_bucket.artifacts.id
rule {
id = "archive-and-expire"
status = "Enabled"
filter { prefix = "uploads/" }
transition {
days = 30
storage_class = "STANDARD_IA"
}
transition {
days = 90
storage_class = "GLACIER"
}
noncurrent_version_expiration { noncurrent_days = 30 }
abort_incomplete_multipart_upload { days_after_initiation = 7 }
}
}
import * as s3 from 'aws-cdk-lib/aws-s3';
import * as kms from 'aws-cdk-lib/aws-kms';
import { Duration, RemovalPolicy } from 'aws-cdk-lib';
const key = kms.Key.fromLookup(this, 'S3Key', { aliasName: 'alias/my-app-s3-key' });
new s3.Bucket(this, 'ArtifactsBucket', {
bucketName: 'my-app-artifacts-prod-eu',
versioned: true,
encryption: s3.BucketEncryption.KMS,
encryptionKey: key,
bucketKeyEnabled: true,
blockPublicAccess: s3.BlockPublicAccess.BLOCK_ALL,
enforceSSL: true,
lifecycleRules: [{
id: 'archive-and-expire',
prefix: 'uploads/',
transitions: [
{ storageClass: s3.StorageClass.INFREQUENT_ACCESS, transitionAfter: Duration.days(30) },
{ storageClass: s3.StorageClass.GLACIER, transitionAfter: Duration.days(90) },
],
noncurrentVersionExpiration: Duration.days(30),
abortIncompleteMultipartUploadAfter: Duration.days(7),
}],
removalPolicy: RemovalPolicy.RETAIN,
});
Versioning must be enabled on both source and destination for replication. CRR does not replicate existing objects by default — only new/changed objects after rule creation (unless you use S3 Batch Replication for backfill). Glacier and Deep Archive objects cannot be replicated directly — transition happens at destination per its lifecycle rules.
S3 security & performance
Most S3 breaches are misconfiguration, not sophisticated attacks. Block Public Access, bucket policies, encryption enforcement, and presigned URLs are your toolkit. On the performance side: prefix design, multipart upload, and S3 Select reduce latency and cost at scale.
Block Public Access
Four account-level and bucket-level settings that override any policy making a bucket public. Enable at the account level in Organizations — defense in depth with SCPs and bucket policies. Even with Block Public Access on, a misconfigured bucket policy can still grant overly broad access to authenticated AWS principals — BPA only blocks anonymous/public access.
Bucket policies vs IAM policies
S3 bucket policies are resource-based — they attach to the bucket and can grant cross-account access without an IAM policy on the other side (though the other principal often still needs permission to call S3). Use bucket policies for: CloudFront OAC, cross-account read, denying unencrypted uploads, requiring VPC endpoint access.
{
"Version": "2012-10-17",
"Statement": [{
"Sid": "DenyUnencryptedUploads",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::my-app-artifacts-prod-eu/*",
"Condition": {
"StringNotEquals": {
"s3:x-amz-server-side-encryption": "aws:kms"
}
}
}]
}
Presigned URLs and POST
Presigned URLs grant temporary access to a specific object (GET or PUT) without making the bucket public. Generated by signing with IAM credentials — expiry from seconds to 7 days (SigV4). Presigned POST allows browser direct upload via HTML form — common for user file uploads from a Spring Boot app without proxying bytes through your API.
Generate a presigned download URL
# Presigned GET — share invoice download for 15 minutes
aws s3 presign s3://my-app-artifacts-prod-eu/invoices/2024/inv-001.pdf \
--expires-in 900 \
--region eu-west-1
# Presigned PUT — client uploads directly (Spring returns URL to browser)
aws s3 presign s3://my-app-artifacts-prod-eu/uploads/user-42/doc.pdf \
--expires-in 3600 \
--region eu-west-1 \
--http-method PUT
# Verify the URL works
curl -I "$(aws s3 presign s3://my-app-artifacts-prod-eu/invoices/2024/inv-001.pdf --expires-in 300)"
# Presign at runtime — grant s3:GetObject on the prefix to the app role.
resource "aws_iam_role_policy" "presign" {
role = aws_iam_role.api.id
policy = jsonencode({
Statement = [{
Effect = "Allow"
Action = ["s3:GetObject"]
Resource = "${aws_s3_bucket.artifacts.arn}/invoices/*"
}]
})
}
taskRole.addToPolicy(new iam.PolicyStatement({
actions: ['s3:GetObject'],
resources: [`${bucket.bucketArn}/invoices/*`],
}));
// Runtime: S3Presigner.create().presignGetObject(...) — 15 min expiry
Encryption — SSE-S3, SSE-KMS, SSE-C
| Type | Keys managed by | When to use |
|---|---|---|
| SSE-S3 (AES256) | AWS — no KMS charges | Default for non-sensitive bulk storage; simplest setup |
| SSE-KMS | AWS KMS — audit trail, key rotation, cross-account | Production default for PII/financial data; enable Bucket Key to cut KMS API costs |
| SSE-C | Customer provides key per request | Rare — you manage key lifecycle; AWS never stores the key |
Enforce encryption at rest with a bucket policy Deny on s3:PutObject when s3:x-amz-server-side-encryption is missing or wrong. Pair with aws:SecureTransport deny for HTTP. For SSE-KMS, the caller needs kms:Decrypt and kms:GenerateDataKey on the key — a common "access denied" on GetObject.
Prefix strategy and request rate
S3 scales automatically, but extremely hot prefixes (millions of PUTs/sec to one prefix) can throttle. For high-throughput workloads, use hex hash prefixes: uploads/a3/f9/object-id instead of uploads/2024/06/10/object-id when date clustering creates hotspots. For most backend apps, date-based prefixes are fine.
Multipart upload
Required for objects > 5 GB; recommended for > 100 MB. Upload parts in parallel (5 MB–5 GB each, max 10,000 parts). Failed uploads leave parts that cost money — lifecycle rule to abort incomplete uploads after 7 days is mandatory. Complete with CompleteMultipartUpload; list in-progress with ListMultipartUploads.
S3 Select and events
- S3 Select — SQL on CSV/JSON/Parquet in-place; filter without full download
- Event notifications — Lambda, SQS, SNS, EventBridge on object create/delete (at-least-once)
Object Lock (WORM)
Retention or legal hold prevents deletion — Governance (admins can override) vs Compliance (nobody deletes until expiry). Requires versioning; bucket created with Object Lock enabled.
Presigned URLs inherit the permissions of the signer — if you presign with an admin role that has s3:*, the URL grants that access. Sign with a minimal role scoped to the specific key prefix. Never log presigned URLs — they are bearer tokens until expiry.
Enable S3 Bucket Key with SSE-KMS — reduces KMS API calls by up to 99% for high-throughput buckets. One KMS call per bucket per request batch instead of per object. Costs drop significantly on workloads with millions of small objects.
EBS deep dive
Amazon EBS provides block storage volumes attached to EC2 instances — like a virtual hard drive. Data persists independently of the instance lifecycle (unlike instance store). Choose volume type based on IOPS, throughput, and cost; snapshot to S3 for backup and cross-region DR.
Block storage fundamentals
EBS volumes live in a single Availability Zone. An EC2 instance must be in the same AZ to attach. You can detach and reattach to another instance in the same AZ (stop instance first for root volumes on Nitro). Size and type can be modified online for most volume types. Max 128 volumes per instance (quota increaseable).
Volume types
| Type | Use case | IOPS | Throughput | Notes |
|---|---|---|---|---|
| gp3 | General purpose — boot volumes, apps, databases | 3,000–16,000 (independent of size) | 125–1,000 MB/s | Default choice — decouple IOPS/throughput from capacity |
| gp2 | Legacy general purpose | 3 IOPS/GB | 128–250 MB/s | Migrate to gp3 |
| io2 Block Express | Mission-critical databases — Oracle, SAP HANA | Up to 256,000 | Up to 4,000 MB/s | 99.999% durability; supports Multi-Attach |
| io2 | High-IOPS databases | Up to 64,000 | Up to 1,000 MB/s | Multi-Attach enabled (io2 only, not io1) |
| st1 | Throughput-optimized — big data, logs, Kafka | 500 IOPS baseline | Up to 500 MB/s | HDD; cannot be boot volume |
| sc1 | Cold HDD — infrequent access | 250 IOPS baseline | Up to 250 MB/s | Lowest cost block storage; cannot be boot volume |
gp3 is ~20% cheaper than gp2 at the same size with baseline 3,000 IOPS included. You pay separately for provisioned IOPS and throughput above baseline — right-size instead of over-provisioning a 1 TB gp2 when 100 GB gp3 with 3,000 IOPS suffices. st1/sc1 are cheaper per GB but HDD latency — never use for database data files.
Snapshots and encryption
EBS snapshots are incremental backups in S3 (AWS-managed). Copy cross-region for DR; use DLM for automated schedules. Enable account-level EBS encryption by default — all new volumes use KMS; no performance penalty on Nitro.
Multi-Attach (io2 only)
One volume on up to 16 instances in the same AZ — requires cluster-aware FS (Oracle RAC, GFS2). Standard ext4/xfs without clustering will corrupt. Use EFS for shared POSIX without cluster software.
EBS vs instance store
| Feature | EBS | Instance store |
|---|---|---|
| Persistence | Survives instance stop/start; independent lifecycle | Ephemeral — lost on stop, terminate, or hardware failure |
| Performance | Network-attached; gp3/io2 predictable | Local NVMe — lowest latency, highest IOPS on i3/d/r instances |
| Use case | Boot volumes, databases, anything that must persist | Caches, temp processing, Kafka log dirs (with replication), Spark shuffle |
| Snapshots | Yes — incremental to S3 | No — data gone when instance gone |
Create a gp3 volume and attach
aws ec2 create-volume \
--availability-zone eu-west-1a \
--size 100 \
--volume-type gp3 \
--iops 3000 \
--throughput 125 \
--encrypted \
--kms-key-id alias/my-app-ebs-key \
--tag-specifications 'ResourceType=volume,Tags=[{Key=Name,Value=order-db-data}]'
# Attach to running instance (data volume — not root)
aws ec2 attach-volume \
--volume-id vol-0abc123def456 \
--instance-id i-0fedcba987654 \
--device /dev/sdf
# On the instance: mkfs, mount, fstab
# sudo mkfs -t xfs /dev/nvme1n1
# sudo mount /dev/nvme1n1 /data
resource "aws_ebs_volume" "order_db_data" {
availability_zone = "eu-west-1a"
size = 100
type = "gp3"
iops = 3000
throughput = 125
encrypted = true
kms_key_id = aws_kms_key.ebs.arn
tags = { Name = "order-db-data" }
}
resource "aws_volume_attachment" "order_db" {
device_name = "/dev/sdf"
volume_id = aws_ebs_volume.order_db_data.id
instance_id = aws_instance.order_db.id
}
new ec2.Volume(this, 'OrderDbData', {
availabilityZone: 'eu-west-1a',
size: cdk.Size.gibibytes(100),
volumeType: ec2.EbsDeviceVolumeType.GP3,
iops: 3000, throughput: 125,
encrypted: true, encryptionKey: key,
});
gp3 when the question says "general purpose SSD" or "cost-optimize boot volume." io2 when IOPS > 16,000 or Multi-Attach is required. st1 for streaming sequential reads (log processing), not random I/O. Instance store when latency matters and data is ephemeral/replicated elsewhere.
Storage patterns
Production storage is never one service — it's backup tiers, cross-region copies, lifecycle automation, and serving static assets without hitting your Spring Boot app. These patterns appear in every Well-Architected review and Solutions Architect exam scenario.
Backup strategy — 3-2-1 on AWS
Three copies (prod + snapshot + cross-region), two media types (EBS block + S3 object), one offsite (CRR or AWS Backup vault copy).
| Source | Backup mechanism | Restore target |
|---|---|---|
| EBS volumes | DLM snapshots → optional cross-region copy | New volume in any AZ (same region) or DR region |
| RDS / Aurora | Automated backups + manual snapshots | Point-in-time restore, cross-region read replica |
| S3 buckets | Versioning + CRR + lifecycle to Glacier | Restore version, fail over to DR bucket |
| EFS | AWS Backup or EFS-to-EFS backup | Restore to new filesystem |
| Hybrid / on-prem | AWS Storage Gateway or DataSync → S3 | EC2 in cloud or reverse sync |
Cross-region DR with S3
Primary bucket with versioning + CRR to a DR region; lifecycle moves DR copies to Standard-IA. On regional failure, point config/DNS at the DR bucket. Enable RTC for RPO < 15 minutes.
Spring Boot static assets on S3 + CloudFront
Don't serve static JS/CSS from your JVM — offload to S3 behind CloudFront with Origin Access Control (OAC). CI uploads hashed assets; Spring templates reference CDN URLs in prod.
- CI runs aws s3 sync with long cache on hashed assets, short cache on index.html
- CloudFront OAC → private S3 origin; ACM cert on custom domain
- Spring Boot serves API only in prod; templates reference CDN URLs, not classpath static
# CI deploy after ./mvnw package
aws s3 sync target/classes/static/ s3://my-app-static-prod/assets/ \
--cache-control "public, max-age=31536000, immutable" --exclude "index.html"
aws s3 cp target/classes/static/index.html s3://my-app-static-prod/index.html \
--cache-control "public, max-age=60"
aws cloudfront create-invalidation --distribution-id E1234567890ABC \
--paths "/index.html" "/assets/*"
Operational checklist
- Account-level Block Public Access + SCP deny on public bucket policies; S3 Access Analyzer weekly
- Versioning on prod buckets; lifecycle expires noncurrent versions; abort incomplete MPU after 7 days
- CRR for critical buckets; EBS snapshot cross-region copy via DLM or AWS Backup
- EBS encryption by default; S3 SSE-KMS with Bucket Key; org-wide Storage Lens for cost anomalies
Enable S3 Object Ownership = Bucket owner enforced to disable ACLs — simplifies permissions to bucket policies and IAM only. For ransomware resilience: versioning + Object Lock (compliance mode) + cross-account replication to a security account where the destination bucket policy denies deletes from prod roles.
When the scenario says "share files across 50 EC2 instances in multiple AZs" → EFS. "Lowest cost archival with retrieval in milliseconds" → Glacier Instant Retrieval, not Deep Archive. "Prevent accidental deletion for compliance" → Object Lock or MFA Delete on versioning. "Static website with global users" → S3 + CloudFront, not public S3 website endpoint alone.