Messaging: SQS, SNS, EventBridge & Kinesis

SQS deep dive

Amazon SQS is a fully managed message queue. Producers send messages; consumers poll and process them. SQS handles durability, retries, and at-least-once delivery — your job is to make handlers idempotent and tune visibility timeout so messages don't get processed twice (or lost).

Standard vs FIFO queues

Feature	Standard queue	FIFO queue (.fifo suffix)
Delivery order	Best-effort ordering; occasional duplicates	Strict FIFO within a message group; exactly-once processing (deduplication)
Throughput	Nearly unlimited	3,000 msg/s with batching; 300 msg/s without (per queue, per region)
Use when	High throughput, order doesn't matter, idempotent handlers	Order matters (payments, inventory), deduplication required
Message group ID	N/A	Required on send — parallel groups, ordered within group
Deduplication	None (at-least-once)	Content-based or explicit MessageDeduplicationId (5 min window)

Visibility timeout

When a consumer receives a message, SQS hides it from other consumers for the visibility timeout (default 30 seconds, max 12 hours). If the consumer deletes the message before timeout expires, it's gone. If processing fails or times out, the message becomes visible again — another consumer can retry it. Set visibility timeout to longer than your p99 processing time.

Too short → duplicate processing while first consumer still working
Too long → slow recovery when consumer crashes mid-process
Use ChangeMessageVisibility to extend timeout for long-running jobs
Lambda event source mapping can auto-adjust visibility based on function timeout

Long polling

Short polling (WaitTimeSeconds=0) returns immediately, often empty — wastes API calls and increases cost. Long polling waits up to 20 seconds for messages to arrive, reducing empty responses and lowering costs. Enable via queue attribute ReceiveMessageWaitTimeSeconds=20 or per-request WaitTimeSeconds.

Dead-letter queues (DLQ)

After maxReceiveCount failed processing attempts, SQS moves the message to a configured DLQ. Essential for debugging poison messages without blocking the main queue. Always monitor DLQ depth with CloudWatch alarms — a growing DLQ means silent failures in production.

Batching

SendMessageBatch sends up to 10 messages per API call; ReceiveMessage retrieves up to 10; DeleteMessageBatch removes up to 10. Batching reduces API costs and improves throughput — especially important for FIFO queues with lower per-queue limits.

Lambda event source mapping

Lambda can poll SQS on your behalf — no long-running poller EC2 tasks needed. Configure batchSize (1–10), maximumBatchingWindow, functionResponseTypes: ReportBatchItemFailures for partial batch failure (retry only failed messages, not the whole batch). Lambda deletes messages only after successful invocation.

Extended client for messages >256 KB

SQS message body limit is 256 KB. For larger payloads, use the SQS Extended Client Library (Java) or store the payload in S3 and send a pointer in SQS. The extended client automatically offloads to S3 when body exceeds threshold and rehydrates on receive. Always encrypt S3 objects (SSE-S3 or SSE-KMS) when storing message payloads.

Create a production queue with DLQ

saved globally

# Create DLQ first
aws sqs create-queue --queue-name orders-dlq.fifo \
  --attributes FifoQueue=true,ContentBasedDeduplication=true

DLQ_ARN=$(aws sqs get-queue-attributes --queue-url $(aws sqs get-queue-url \
  --queue-name orders-dlq.fifo --query QueueUrl --output text) \
  --attribute-names QueueArn --query 'Attributes.QueueArn' --output text)

# Main FIFO queue with redrive policy
aws sqs create-queue --queue-name orders.fifo \
  --attributes '{
    "FifoQueue": "true",
    "ContentBasedDeduplication": "true",
    "VisibilityTimeout": "120",
    "ReceiveMessageWaitTimeSeconds": "20",
    "RedrivePolicy": "{\"deadLetterTargetArn\":\"'"$DLQ_ARN"'\",\"maxReceiveCount\":\"3\"}"
  }'

resource "aws_sqs_queue" "orders_dlq" {
  name                        = "orders-dlq.fifo"
  fifo_queue                  = true
  content_based_deduplication = true
}

resource "aws_sqs_queue" "orders" {
  name                        = "orders.fifo"
  fifo_queue                  = true
  content_based_deduplication = true
  visibility_timeout_seconds  = 120
  receive_wait_time_seconds   = 20

  redrive_policy = jsonencode({
    deadLetterTargetArn = aws_sqs_queue.orders_dlq.arn
    maxReceiveCount       = 3
  })
}

import * as sqs from 'aws-cdk-lib/aws-sqs';
import { Duration } from 'aws-cdk-lib';

const dlq = new sqs.Queue(this, 'OrdersDlq', {
  fifo: true,
  contentBasedDeduplication: true,
});

const queue = new sqs.Queue(this, 'OrdersQueue', {
  fifo: true,
  contentBasedDeduplication: true,
  visibilityTimeout: Duration.seconds(120),
  receiveMessageWaitTime: Duration.seconds(20),
  deadLetterQueue: { queue: dlq, maxReceiveCount: 3 },
});

⚠️ Pitfall

Setting visibility timeout to 30 seconds when your Lambda timeout is 60 seconds — the message becomes visible again while Lambda is still running, causing duplicate order processing. Rule of thumb: visibility timeout ≥ 6 × Lambda timeout for event source mappings, or use partial batch failure reporting.

🔬 Under the Hood

SQS stores messages redundantly across multiple AZs within a region. A message is considered "in flight" after ReceiveMessage until deleted or visibility timeout expires. Standard queues use distributed queues — that's why ordering isn't guaranteed. FIFO queues use a single logical queue per message group ID with deduplication tracked in a 5-minute sliding window.

💰 Cost

SQS pricing is per request (send, receive, delete). Long polling reduces empty receives — often cuts costs 50%+ vs short polling. First 1 million requests/month free. Extended client adds S3 PUT/GET costs for large payloads. FIFO queues cost ~25% more per request than standard.

SNS deep dive

Amazon SNS is a publish/subscribe notification service. One publisher sends to a topic; SNS delivers to all subscribers — SQS queues, Lambda functions, HTTP endpoints, email, SMS. It's the fan-out layer that lets one event trigger many independent consumers without the publisher knowing who they are.

Pub/sub model

Producers call Publish on a topic ARN. SNS handles delivery to every subscribed endpoint. Subscribers never poll the topic — SQS queues buffer messages for async workers; Lambda gets invoked directly; HTTP endpoints receive POST requests. Message size limit: 256 KB (same as SQS).

Fan-out to SQS — the canonical pattern

Subscribe multiple SQS queues to one SNS topic. Each queue gets a copy of every message (unless filtered). This decouples the publisher from N consumers and gives each consumer independent retry/DLQ semantics via SQS. Raw message delivery (RawMessageDelivery=true) skips SNS metadata wrapping — the SQS body is exactly what the publisher sent.

flowchart LR
  PUB["Order Service\n(SNS Publish)"]
  TOPIC["SNS Topic\norder-events"]
  Q1["SQS: inventory.fifo"]
  Q2["SQS: notifications"]
  Q3["SQS: analytics"]
  L1["Lambda: fraud-check"]
  PUB --> TOPIC
  TOPIC --> Q1
  TOPIC --> Q2
  TOPIC --> Q3
  TOPIC --> L1
  Q1 --> W1["Inventory Worker"]
  Q2 --> W2["Email Service"]
  Q3 --> W3["Kinesis Firehose"]

Filter policies

Not every subscriber needs every message. SNS subscription filter policies route messages based on JSON attributes in the message body or message attributes. Example: inventory queue receives only {"eventType": ["order.placed", "order.cancelled"]}; analytics queue receives everything. Reduces noise, cost, and unnecessary Lambda invocations.

{
  "eventType": ["order.placed"],
  "amount": [{ "numeric": [">=", 1000] }]
}

FIFO topics

SNS FIFO topics (.fifo suffix) deliver messages in order to FIFO SQS subscriptions. Required when strict ordering must be preserved end-to-end from publisher through SNS to SQS consumer. Standard SNS topics can fan out to standard SQS only; FIFO topics fan out to FIFO SQS queues. Throughput: 300 msg/s per topic (3,000 with batching), same as SQS FIFO limits.

Subscribe SQS to SNS with filter policy

saved globally

aws sns create-topic --name order-events --attributes FifoTopic=true

aws sns subscribe \
  --topic-arn arn:aws:sns:eu-west-1:123456789012:order-events.fifo \
  --protocol sqs \
  --notification-endpoint arn:aws:sqs:eu-west-1:123456789012:inventory.fifo \
  --attributes '{
    "RawMessageDelivery": "true",
    "FilterPolicy": "{\"eventType\":[\"order.placed\"]}"
  }'

resource "aws_sns_topic" "order_events" {
  name                        = "order-events.fifo"
  fifo_topic                  = true
  content_based_deduplication = true
}

resource "aws_sns_topic_subscription" "inventory" {
  topic_arn            = aws_sns_topic.order_events.arn
  protocol             = "sqs"
  endpoint             = aws_sqs_queue.inventory.arn
  raw_message_delivery = true
  filter_policy = jsonencode({
    eventType = ["order.placed"]
  })
}

import * as sns from 'aws-cdk-lib/aws-sns';
import * as subs from 'aws-cdk-lib/aws-sns-subscriptions';

const topic = new sns.Topic(this, 'OrderEvents', {
  fifo: true,
  contentBasedDeduplication: true,
});

topic.addSubscription(new subs.SqsSubscription(inventoryQueue, {
  rawMessageDelivery: true,
  filterPolicy: {
    eventType: sns.SubscriptionFilter.stringFilter({
      allowlist: ['order.placed'],
    }),
  },
}));

🔒 Security

When SNS delivers to SQS, the queue policy must allow sqs:SendMessage from the SNS topic ARN — CDK/Terraform handle this automatically; raw CLI requires explicit queue policy. Use aws:SourceArn condition to prevent confused deputy attacks.

🎯 Exam Tip

SNS + SQS fan-out is the exam answer when one event must trigger multiple independent microservices with separate retry semantics. SNS alone (no SQS) when you need immediate push to Lambda/HTTP without buffering. Filter policies when subscribers need subsets — not separate topics per event type.

EventBridge

Amazon EventBridge is a serverless event router — think SNS with structured events, content-based routing, cross-account delivery, and native integrations with 20+ AWS services and SaaS partners. It's the backbone of event-driven architectures that span accounts, regions, and third-party systems.

Event buses

Default bus — receives events from AWS services in your account (EC2 state change, S3 object created, etc.)
Custom buses — your application events; isolate domains (billing bus, orders bus)
Partner buses — SaaS providers (Datadog, Zendesk, Auth0) publish directly to your account
Archive — store all events on a bus for replay and audit (retention 1–365 days)

Rules, patterns, and targets

A rule matches events via an event pattern (JSON filter on source, detail-type, detail fields) and routes to targets — Lambda, SQS, SNS, Step Functions, ECS tasks, API destinations (HTTP), another event bus (cross-account/region). One event can match multiple rules; one rule can have multiple targets.

{
  "source": ["com.mycompany.orders"],
  "detail-type": ["OrderPlaced"],
  "detail": {
    "amount": [{ "numeric": [">=", 500] }],
    "region": ["eu-west-1"]
  }
}

EventBridge Scheduler

Replaces CloudWatch Events scheduled rules with more flexibility: one-time schedules, rate/cron expressions, time windows, flexible time windows (jitter), and dead-letter support. Invoke Lambda, SQS, ECS, Step Functions on a schedule without maintaining EventBridge rules per job. Supports timezone-aware cron.

Archive and replay

Archive all events (or filtered subset) on a bus. Replay archived events to the same bus or a different bus — useful for reprocessing after bug fixes, testing new consumers against production event history, or disaster recovery drills. Replay creates new events with a replay-name field.

EventBridge Pipes

Point-to-point integration: single source → optional enrichment/filter → single target. Sources include SQS, Kinesis, DynamoDB streams, MSK. Targets include Step Functions, Lambda, ECS, etc. Pipes handle polling, batching, and partial failure — simpler than Lambda poller + custom code for stream-to-target pipelines. Enrichment step can call Lambda or API Gateway to hydrate events before delivery.

Cross-account event routing

Account A creates a rule on a custom bus with target = Account B's event bus. B's bus policy allows A to events:PutEvents. Enables centralized audit/logging accounts, multi-account event-driven architectures without SNS topic sprawl.

📦 Real World

Capital One and other regulated enterprises use EventBridge as the central nervous system across AWS accounts — CloudTrail, Config, GuardDuty findings, and application events all route to a security account bus for automated remediation Step Functions workflows.

⚖️ Trade-off

EventBridge vs SNS: EventBridge adds schema registry, archive/replay, cross-account buses, and richer filtering — but higher latency (~few ms more) and different pricing model (per event + per rule invocation). Use SNS for simple fan-out within one account; EventBridge when events need routing logic, audit trails, or multi-account topology.

Kinesis streams

Amazon Kinesis Data Streams is a real-time data streaming platform. Producers write records to shards; consumers read in order within a shard. Use Kinesis when you need high throughput, strict ordering per partition key, multiple concurrent consumers reading the same stream, or replay from any point in time.

Shards and capacity

A stream is divided into shards — each shard ingests 1 MB/s or 1,000 records/s and emits 2 MB/s. Partition key determines shard assignment (hash mod shard count). Scale by splitting/merging shards or use on-demand mode (auto-scales, pay per GB ingested). Data retention: 24 hours default, up to 365 days (extended retention costs extra).

Hot shards

If most records share the same partition key (e.g. all events use "default"), one shard gets overloaded while others sit idle — throttling on PutRecords despite available capacity elsewhere. Fix: design partition keys for even distribution (user ID, order ID, device ID — high cardinality). Monitor IncomingRecords and WriteProvisionedThroughputExceeded per shard in CloudWatch.

Enhanced fan-out

Standard consumers share 2 MB/s per shard across all consumers — N consumers divide bandwidth. Enhanced fan-out (EFO) gives each registered consumer dedicated 2 MB/s per shard at ~70 ms latency (vs ~200 ms shared). Worth it when you have 3+ consumers on the same stream (analytics + real-time + audit). Costs per consumer-shard hour plus data retrieval.

Consumers

KCL (Kinesis Client Library) — Java/.NET managed consumer with checkpointing to DynamoDB
Kafka-compatible — Kinesis as Kafka API endpoint for existing Kafka clients
Lambda — event source mapping with batch size and bisect-on-error
Firehose — not a consumer; delivery stream to S3/Redshift/OpenSearch (see storage chapter)

SQS vs Kinesis — when to pick which

Requirement	SQS	Kinesis
Message deleted after processing	✅ Yes (pull, delete)	❌ Records persist for retention period
Multiple consumers, same data	❌ Each needs own queue (SNS fan-out)	✅ Native — all consumers read same stream
Replay from history	❌ Gone after delete	✅ Reset iterator to any timestamp/sequence
Ordering	FIFO only (limited throughput)	✅ Per partition key, high throughput
Operational complexity	✅ Minimal — no shards	⚠️ Shard management, hot key tuning
Typical use case	Task queues, async jobs, decoupling	Clickstreams, IoT telemetry, log aggregation

💡 Pro Tip

Start with SQS for work queues. Reach for Kinesis only when you need replay, multiple independent consumers on the same data, or ordering at high throughput. Many teams over-engineer with Kinesis when SNS→SQS fan-out would be simpler and cheaper.

🎯 Exam Tip

"Real-time analytics on clickstream data with multiple applications consuming the same events" → Kinesis Data Streams. "Decouple order placement from fulfillment with retries" → SQS. "Deliver stream data to S3 without writing consumer code" → Kinesis Data Firehose (not Streams).

Step Functions

AWS Step Functions orchestrates multi-step workflows as state machines. Define steps, retries, error handling, parallel branches, and human approval gates in JSON (ASL — Amazon States Language) or visually. Integrates natively with Lambda, SQS, SNS, ECS, DynamoDB, and 220+ services via service integrations.

Standard vs Express workflows

Feature	Standard	Express
Duration	Up to 1 year	Up to 5 minutes
Execution semantics	Exactly-once (at-most-once for tasks)	At-least-once — design for idempotency
Pricing	Per state transition	Per execution + duration + memory
Use when	Long-running, human approval, audit trail needed	High-volume, short workflows (IoT, streaming ETL)
Execution history	Full history in console/API	CloudWatch Logs only (optional)

State types that matter

Task — invoke Lambda, SQS send, SNS publish, ECS run task, API Gateway, etc.
Choice — branch on input fields (if/else without Lambda)
Parallel — run branches concurrently, wait for all
Map — iterate array items (inline or distributed for large datasets)
Wait — delay N seconds or until timestamp (no Lambda sleep billing)
Retry / Catch — declarative error handling with backoff — replaces try/catch Lambda chains

Integration with Lambda and SQS

Step Functions can sqs:sendMessage.waitForTaskToken — send to SQS with a task token; external worker processes and calls SendTaskSuccess or SendTaskFailure to resume the workflow. Pattern for human approval or long-running external systems without polling. Lambda tasks use synchronous invoke — Step Functions waits for response.

Order fulfillment state machine

saved globally

aws stepfunctions create-state-machine \
  --name order-fulfillment \
  --role-arn arn:aws:iam::123456789012:role/StepFunctionsExecutionRole \
  --definition '{
    "StartAt": "ValidateOrder",
    "States": {
      "ValidateOrder": {
        "Type": "Task",
        "Resource": "arn:aws:lambda:eu-west-1:123456789012:function:validate-order",
        "Retry": [{ "ErrorEquals": ["Lambda.ServiceException"], "IntervalSeconds": 2, "MaxAttempts": 3, "BackoffRate": 2 }],
        "Next": "ChargePayment"
      },
      "ChargePayment": {
        "Type": "Task",
        "Resource": "arn:aws:states:::sqs:sendMessage.waitForTaskToken",
        "Parameters": {
          "QueueUrl": "https://sqs.eu-west-1.amazonaws.com/123456789012/payments.fifo",
          "MessageBody": { "orderId.$": "$.orderId", "taskToken.$": "$$.Task.Token" }
        },
        "Next": "ShipOrder"
      },
      "ShipOrder": {
        "Type": "Task",
        "Resource": "arn:aws:lambda:eu-west-1:123456789012:function:ship-order",
        "End": true
      }
    }
  }'

resource "aws_sfn_state_machine" "order_fulfillment" {
  name     = "order-fulfillment"
  role_arn = aws_iam_role.sfn_exec.arn

  definition = jsonencode({
    StartAt = "ValidateOrder"
    States = {
      ValidateOrder = {
        Type     = "Task"
        Resource = aws_lambda_function.validate_order.arn
        Retry = [{
          ErrorEquals     = ["Lambda.ServiceException"]
          IntervalSeconds = 2
          MaxAttempts     = 3
          BackoffRate     = 2
        }]
        Next = "ChargePayment"
      }
      ChargePayment = {
        Type     = "Task"
        Resource = "arn:aws:states:::sqs:sendMessage.waitForTaskToken"
        Parameters = {
          QueueUrl    = aws_sqs_queue.payments.url
          MessageBody = {
            "orderId.$"    = "$.orderId"
            "taskToken.$"  = "$$.Task.Token"
          }
        }
        Next = "ShipOrder"
      }
      ShipOrder = {
        Type     = "Task"
        Resource = aws_lambda_function.ship_order.arn
        End      = true
      }
    }
  })
}

import * as sfn from 'aws-cdk-lib/aws-stepfunctions';
import * as tasks from 'aws-cdk-lib/aws-stepfunctions-tasks';

const validate = new tasks.LambdaInvoke(this, 'ValidateOrder', {
  lambdaFunction: validateFn,
  outputPath: '$.Payload',
});

const charge = new tasks.SqsSendMessage(this, 'ChargePayment', {
  queue: paymentsQueue,
  messageBody: sfn.TaskInput.fromObject({
    orderId: sfn.JsonPath.stringAt('$.orderId'),
    taskToken: sfn.JsonPath.taskToken,
  }),
  integrationPattern: sfn.IntegrationPattern.WAIT_FOR_TASK_TOKEN,
});

const ship = new tasks.LambdaInvoke(this, 'ShipOrder', {
  lambdaFunction: shipFn,
});

new sfn.StateMachine(this, 'OrderFulfillment', {
  definition: validate.next(charge).next(ship),
});

⚠️ Pitfall

Chaining 5 Lambdas where each invokes the next — you lose visibility, retry logic, and pay for idle wait time. Step Functions Standard workflows cost per state transition — a 50-step workflow at 1M executions/month adds up. Use Express for high-volume short flows; consolidate steps where possible.

Messaging patterns

Production architectures combine these services into repeatable patterns. The SNS→SQS→Lambda fan-out is the AWS equivalent of a message broker with independent consumer groups. Event-driven microservices trade synchronous coupling for eventual consistency — your job is to make that consistency explicit and observable.

SNS → SQS → Lambda (canonical fan-out)

Order service publishes OrderPlaced to SNS topic
SNS fan-out to inventory SQS, notification SQS, analytics SQS (with filter policies)
Each queue triggers its own Lambda or ECS worker with independent scaling and DLQ
Failed inventory update doesn't block email notification — isolation by design

Event-driven microservices principles

Idempotent consumers — at-least-once delivery means duplicates happen; use dedup keys in DynamoDB
Schema evolution — version event payloads (eventVersion: 2); EventBridge schema registry helps
Outbox pattern — write business data + outbound event in same DB transaction; separate poller publishes to SNS/EventBridge
Saga pattern — Step Functions or choreographed SNS events for distributed transactions without 2PC
Observability — trace correlation IDs through SNS attributes → SQS message attributes → Lambda X-Ray segments

Spring Cloud AWS integration

Spring Cloud AWS (formerly Spring Cloud AWS 3.x with AWS SDK v2) provides @SqsListener for declarative queue consumption, SqsTemplate for sending, and SNS publish support. Configure with IAM task roles — no access keys in application.yml.

@Service
public class OrderEventConsumer {

  @SqsListener("${app.queue.orders}")
  public void handleOrderPlaced(OrderPlacedEvent event) {
    // Idempotent: check processedEvents table before side effects
    inventoryService.reserveStock(event.orderId(), event.items());
  }
}

@Configuration
class MessagingConfig {
  @Bean
  SqsTemplate sqsTemplate(SqsAsyncClient client) {
    return SqsTemplate.builder().sqsAsyncClient(client).build();
  }
}

For ECS/Fargate Spring services, run the SQS listener in the same task as your HTTP server (or a dedicated worker task definition). Set visibility timeout on the queue to exceed worst-case handler duration. Enable spring.cloud.aws.sqs.listener.max-concurrent-messages tuning for throughput.

Service comparison — SQS vs SNS vs EventBridge vs Kinesis

Dimension	SQS	SNS	EventBridge	Kinesis
Primary model	Queue (pull)	Pub/sub (push)	Event router (push)	Stream (pull)
Consumers	One consumer group per queue	Many subscribers per topic	Many targets per rule	Many consumers per stream
Message persistence	Until deleted (14 days max)	None — fire and forget	Archive optional (365 days)	24h–365 days retention
Ordering	FIFO queues only	FIFO topics → FIFO subs	No ordering guarantee	Per partition key
Filtering	Consumer-side only	Subscription filter policies	Event pattern rules	Consumer-side only
Replay	❌	❌	✅ Archive & replay	✅ Iterator reset
Cross-account	Queue policy	Topic policy	Native bus-to-bus	Resource policy
AWS service events	❌	❌	✅ Default bus	❌
Throughput	Standard: unlimited; FIFO: 3K/s	Standard: unlimited; FIFO: 3K/s	Soft limits; high scale	Shard-limited; on-demand scales
Best for	Work queues, buffering, DLQ	Fan-out notifications	Event routing, schedules, SaaS	Analytics, IoT, replay

$ # Approximate message counts (not exact — eventual consistency)
$ aws sqs get-queue-attributes --queue-url $QUEUE_URL \
  --attribute-names ApproximateNumberOfMessages,ApproximateNumberOfMessagesNotVisible
→ Visible: 42 | In flight: 7
$ # Peek at DLQ without deleting (use receive, inspect, let visibility expire)
$ aws sqs receive-message --queue-url $DLQ_URL --max-number-of-messages 1 \
  --attribute-names All --message-attribute-names All
$ # Redrive DLQ messages back to source queue (max 10 at a time via API)
$ aws sqs start-message-move-task \
  --source-arn $DLQ_ARN --destination-arn $QUEUE_ARN

🔒 Security

Encrypt queues and topics with SSE-SQS/SSE-SNS (AWS-managed) or SSE-KMS (customer-managed keys for audit/compliance). KMS adds latency on every send/receive — benchmark before mandating org-wide. Use VPC endpoints for SQS/SNS when workers in private subnets must not traverse NAT for AWS API calls.

📦 Real World

Deliveroo and Monzo run event-driven architectures on SNS/SQS at scale — order lifecycle events fan out to fulfillment, rider dispatch, and analytics without synchronous coupling. Both teams treat DLQ depth as a primary paging metric and enforce idempotency keys at the database layer.

🎯 Exam Tip

When the question mentions "decouple microservices" + "retries" + "scale independently" → SNS + SQS. When it mentions "schedule a Lambda every Tuesday" → EventBridge Scheduler (not CloudWatch Events legacy). When it mentions "reprocess events after fixing a bug" → EventBridge archive/replay or Kinesis iterator reset.