IAM: Identity & Access Management
IAM is the most important AWS service because it gates every other service. Get IAM wrong and nothing else matters — public S3 buckets, leaked access keys, overprivileged Lambda roles, and cross-account trust misconfigurations are all IAM failures. This chapter covers how AWS evaluates every API call, how to write policies that are tight enough to pass audit but loose enough to deploy, and how production teams grant access without long-lived credentials.
IAM fundamentals
Everything in AWS is an API call. IAM decides whether that call is allowed. There is no separate "login server" per service — one IAM engine evaluates all requests, whether they come from the console, CLI, SDK, Terraform, or CDK.
Everything is an API call
When you click "Create bucket" in the console, the browser sends s3:CreateBucket to the S3 control plane. When your Spring Boot app on ECS reads a secret, the task role credentials sign secretsmanager:GetSecretValue. When Terraform runs aws ec2 describe-instances, the IAM user or role behind the provider must have ec2:DescribeInstances. Same engine, same evaluation order, every time.
IAM is global
IAM users, groups, roles, and policies exist once per account, not per region. You create a role in IAM once and attach it to EC2 instances in any region. The exception: some actions are region-scoped in policies via condition keys like aws:RequestedRegion, but the IAM entities themselves are global. Route53 and CloudFront are also global services; most others are regional.
Core components
| Component | What it is | Production guidance |
|---|---|---|
| Users | Long-term identity with optional access keys and console password | Avoid for humans — use SSO. OK for legacy CI with rotation; prefer OIDC roles |
| Groups | Collection of users; policies attach to the group | Map SSO groups to permission sets instead of IAM groups when possible |
| Roles | Temporary credentials via sts:AssumeRole | Default for everything — EC2, ECS, Lambda, cross-account, CI/CD |
| Policies | JSON documents: Effect, Action, Resource, Condition | Customer-managed for app-specific; AWS managed to bootstrap, then narrow |
| Identity providers | SAML/OIDC/OAuth trust for federation | Corporate IdP → IAM Identity Center; GitHub → OIDC provider for Actions |
IAM is a regional service with a global endpoint. Policy data replicates globally within an account, but evaluation happens at the edge of each AWS API. When you assume a role, STS returns temporary credentials (access key + secret + session token) valid for 15 minutes to 12 hours — the session token is what makes them temporary; without it, the access key alone is useless.
IAM itself is free. You pay nothing for users, roles, or policies. Costs appear indirectly: overly broad s3:* policies enable data exfiltration; missing SCPs enable crypto-mining in rogue regions. IAM Access Analyzer and IAM Identity Center have their own pricing tiers — Access Analyzer is free for external access findings; policy validation is included.
Policy evaluation logic
AWS evaluates policies in a fixed order. One explicit Deny anywhere overrides any number of Allow statements. If nothing allows the action, the default is implicit deny — the request fails.
flowchart TD
START["Incoming API request\n(principal + action + resource)"]
START --> EXPLICIT{"Explicit Deny\nin any policy?"}
EXPLICIT -->|Yes| DENY["❌ DENY"]
EXPLICIT -->|No| SCP{"SCP allows?\n(org guardrail)"}
SCP -->|No| DENY
SCP -->|Yes| RP{"Resource-based policy\nallows?"}
RP -->|Explicit Deny| DENY
RP -->|Allow| ALLOW["✅ ALLOW"]
RP -->|No match| ID{"Identity-based policies\n(user/role/group) allow?"}
ID -->|Explicit Deny| DENY
ID -->|Allow| PB{"Permission boundary\nallows?"}
ID -->|No Allow| IMPLICIT["❌ Implicit DENY"]
PB -->|No| DENY
PB -->|Yes| SP{"Session policy\n(AssumeRole) allows?"}
SP -->|No| DENY
SP -->|Yes| ALLOW
Evaluation order (simplified — exam-critical):
- Explicit Deny — any policy, any type → immediate deny
- SCPs — organization guardrails; cannot grant permissions, only deny or pass-through
- Resource-based policies — S3 bucket policy, KMS key policy, SQS queue policy, Lambda resource policy
- Identity-based policies — attached to user, group, or role
- Permission boundaries — cap maximum permissions even if identity policies allow more
- Session policies — passed during AssumeRole to further restrict the session
- Implicit deny — default if no Allow matched
SCPs never grant permissions — they only filter what identity policies can grant. A common trap: "We attached PowerUserAccess SCP to the OU" — SCPs don't work that way. Also: resource policies can grant cross-account access without identity policies (S3 bucket policy allowing another account's role) — both sides may need to allow the action.
Attaching AdministratorAccess to a role "temporarily for debugging" and forgetting to remove it. Combined with a permissive trust policy ("Principal": {"AWS": "*"}), you've created a persistent backdoor. Use aws sts assume-role with MFA and time-bounded break-glass roles instead.
IAM policy deep dive
Policies are JSON. Master the structure once and you can read any AWS permission document — IAM policies, S3 bucket policies, KMS key policies, and SCPs all share the same grammar.
Policy structure
{
"Version": "2012-10-17",
"Statement": [{
"Sid": "AllowReadOwnBucketPrefix",
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:ListBucket"],
"Resource": [
"arn:aws:s3:::my-app-artifacts",
"arn:aws:s3:::my-app-artifacts/${aws:username}/*"
],
"Condition": {
"Bool": { "aws:SecureTransport": "true" },
"StringEquals": { "aws:RequestedRegion": "eu-west-1" }
}
}]
}
| Element | Purpose |
|---|---|
| Version | Always 2012-10-17 for new policies |
| Effect | Allow or Deny |
| Action | API operations — service:Operation or wildcards |
| Resource | ARN(s) the action applies to; omit for IAM actions that aren't resource-scoped |
| Condition | Optional constraints — IP, MFA, tags, encryption headers, source VPC |
| Principal | Only in resource-based policies — who is allowed/denied |
Wildcards — when to use and when to avoid
- s3:Get* — acceptable during prototyping; replace with explicit actions in production
- s3:* on arn:aws:s3:::* — almost never acceptable for application roles
- ec2:Describe* — common for read-only ops roles; low risk (no mutations)
- ? — single-character wildcard in some resource patterns; rarely needed
Condition operators and keys
| Operator | Use case | Example key |
|---|---|---|
| StringEquals | Exact match | aws:RequestedRegion, aws:PrincipalTag/Team |
| StringLike | Pattern match with * | s3:prefix = uploads/${aws:username}/* |
| IpAddress | Restrict to CIDR | aws:SourceIp — corporate VPN only |
| DateLessThan | Time-bound access | aws:CurrentTime — contractor expiry |
| Bool | True/false flags | aws:SecureTransport = true (HTTPS only) |
| ArnLike | ARN pattern | aws:SourceArn — Lambda can only be invoked by this SNS topic |
| NumericLessThan | Thresholds | s3:max-keys — limit list operations |
High-value condition keys for backend engineers:
- aws:MultiFactorAuthPresent — require MFA for sensitive operations
- aws:CalledVia — restrict which service may call on your behalf (confused deputy prevention)
- aws:SourceVpc / aws:SourceVpce — only from your VPC or endpoint
- ec2:Region — legacy; prefer aws:RequestedRegion
Policy variables
Dynamic policies use ${aws:username}, ${aws:userid}, ${aws:PrincipalTag/key} in Resource or Condition values — one policy template for all developers instead of one policy per person.
NotAction and NotResource
NotAction means "everything except these actions" — e.g. deny all except s3:GetObject. Powerful and dangerous: easy to accidentally allow more than intended. Use explicit Allow lists instead when possible.
Inline vs managed policies
- AWS managed — AmazonS3ReadOnlyAccess; maintained by AWS, good starting point
- Customer managed — your reusable policies; versioned, attachable to multiple roles, reviewable in PRs
- Attach up to 10 managed policies per user/role/group (quota increaseable)
- Embedded directly on one user/role/group — deleted when entity is deleted
- No versioning, no reuse, hard to audit across accounts
- Acceptable only for generated one-off break-glass policies
Example: least-privilege S3 read for a Spring service role
cat > /tmp/order-service-s3-policy.json <<'EOF'
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:PutObject"],
"Resource": "arn:aws:s3:::my-app-artifacts/orders/*",
"Condition": { "Bool": { "aws:SecureTransport": "true" } }
}]
}
EOF
aws iam create-policy --policy-name OrderServiceS3Access \
--policy-document file:///tmp/order-service-s3-policy.json
aws iam attach-role-policy --role-name order-service-ecs-task \
--policy-arn arn:aws:iam::123456789012:policy/OrderServiceS3Access
resource "aws_iam_policy" "order_service_s3" {
name = "OrderServiceS3Access"
policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Action = ["s3:GetObject", "s3:PutObject"]
Resource = "${aws_s3_bucket.artifacts.arn}/orders/*"
Condition = {
Bool = { "aws:SecureTransport" = "true" }
}
}]
})
}
resource "aws_iam_role_policy_attachment" "order_service" {
role = aws_iam_role.ecs_task.name
policy_arn = aws_iam_policy.order_service_s3.arn
}
import * as iam from 'aws-cdk-lib/aws-iam';
import * as s3 from 'aws-cdk-lib/aws-s3';
const bucket = s3.Bucket.fromBucketName(this, 'Artifacts', 'my-app-artifacts');
taskRole.addToPolicy(new iam.PolicyStatement({
effect: iam.Effect.ALLOW,
actions: ['s3:GetObject', 's3:PutObject'],
resources: [`${bucket.bucketArn}/orders/*`],
conditions: { Bool: { 'aws:SecureTransport': 'true' } },
}));
Enforce HTTPS on S3 with aws:SecureTransport in the identity policy and a bucket policy deny when the condition is false — defense in depth. Pair with bucket policy denying uploads without x-amz-server-side-encryption.
IAM roles — the right way to grant access
Never use long-term access keys if a role can be used. Roles provide temporary credentials automatically rotated by AWS. Every compute service on AWS is designed around roles — fighting that design means credential sprawl and 3 AM incidents.
Trust policy vs permissions policy
Every role has two documents: a trust policy (who can assume the role) and one or more permissions policies (what the role can do once assumed). Confusing them is the #1 IAM mistake — granting s3:* in the trust policy does nothing; putting ec2.amazonaws.com in a permissions policy does nothing.
| Pattern | Trust principal | Typical permissions |
|---|---|---|
| EC2 instance profile | ec2.amazonaws.com | S3, SSM, CloudWatch — app needs on the instance |
| ECS task role | ecs-tasks.amazonaws.com | Per-service: RDS via proxy, SQS publish, Secrets Manager — not shared with the EC2 host role |
| Lambda execution role | lambda.amazonaws.com | CloudWatch Logs + exactly what the handler calls — avoid AdministratorAccess |
| Cross-account role | Account B root or specific role ARN + sts:ExternalId | Read-only audit, shared services, CI deploy to prod account |
| Service-linked role | Pre-created by AWS service | Auto-managed; don't delete — breaks the service (e.g. AWSServiceRoleForECS) |
ECS task role vs EC2 instance role
On ECS with EC2 launch type, the container instance has an instance profile (pull images from ECR, send logs). Each task gets its own task role injected as environment credentials — your Spring service gets only the S3/DynamoDB permissions it needs, not the host's ECR pull permissions. On Fargate, there is no instance profile — only the task role matters.
Cross-account access
Account A assumes a role in Account B: B's role trust policy allows A's principal; B's permissions policy grants the actions; A's identity may also need sts:AssumeRole on the role ARN. Use External ID when a third party assumes into your account — prevents confused deputy attacks where another customer tricks the third party into assuming your role.
Role chaining
Assume role A, then from A assume role B — maximum session duration 1 hour for chained roles. Useful for hub-and-spoke access patterns; adds latency (two STS calls). Prefer direct trust where possible.
Create an ECS task role (production pattern)
aws iam create-role --role-name order-service-task \
--assume-role-policy-document '{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": { "Service": "ecs-tasks.amazonaws.com" },
"Action": "sts:AssumeRole"
}]
}'
# Reference in task definition:
# "taskRoleArn": "arn:aws:iam::123456789012:role/order-service-task"
resource "aws_iam_role" "ecs_task" {
name = "order-service-task"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Action = "sts:AssumeRole"
Principal = { Service = "ecs-tasks.amazonaws.com" }
}]
})
}
resource "aws_ecs_task_definition" "app" {
family = "order-service"
task_role_arn = aws_iam_role.ecs_task.arn
execution_role_arn = aws_iam_role.ecs_execution.arn
# ...
}
import * as ecs from 'aws-cdk-lib/aws-ecs';
import * as iam from 'aws-cdk-lib/aws-iam';
const taskRole = new iam.Role(this, 'OrderServiceTaskRole', {
assumedBy: new iam.ServicePrincipal('ecs-tasks.amazonaws.com'),
description: 'Runtime permissions for order-service containers',
});
new ecs.FargateTaskDefinition(this, 'TaskDef', {
taskRole,
// executionRole for ECR pull + logs — separate, tighter role
});
Netflix pioneered cross-account IAM roles for their microservices — each service assumes a role scoped to its dataset, with no long-lived keys on instances. Stripe uses strict role boundaries between PCI and non-PCI accounts. Both treat IAM roles as the only credential mechanism in production.
One role per service vs shared role: shared roles simplify ops but violate least privilege when services diverge (payments needs KMS, catalog doesn't). Start with one role per ECS service / Lambda function; merge only when permissions are identical and lifecycle is identical.
Least privilege in practice
Least privilege isn't a one-time policy write — it's a continuous loop: deploy with broader AWS managed policies, observe actual usage in CloudTrail, narrow to customer-managed policies, repeat. Tools exist at every step.
The narrowing workflow
- Start with AWS managed policy closest to need (e.g. AmazonDynamoDBFullAccess in dev only)
- Enable CloudTrail in all regions; log to a security account
- Run IAM Access Advisor — see last-accessed services per user/role; remove unused actions
- Use IAM Access Analyzer — detect resources shared externally; validate policies before deploy
- Generate policies from CloudTrail with Access Analyzer policy generator (last 90 days of actual API calls)
- Replace with customer-managed policy; attach permission boundary for delegated admin teams
Permission boundaries
A permission boundary caps what a role/user can do even if attached policies allow more. Use case: let a product team create their own roles in CI, but cap them with a boundary that denies iam:*, organizations:*, and s3:DeleteBucket. The effective permission is the intersection of boundary and identity policies.
CloudTrail → Athena: find what a role actually used
-- Run against CloudTrail logs table in Athena
SELECT eventSource, eventName, count(*) AS calls
FROM cloudtrail_logs
WHERE userIdentity.arn LIKE '%order-service-task%'
AND eventTime > date_add('day', -90, current_timestamp)
GROUP BY eventSource, eventName
ORDER BY calls DESC
LIMIT 50;
$ # Simulate: can this role publish to our queue? $ aws iam simulate-principal-policy \ --policy-source-arn arn:aws:iam::123456789012:role/order-service-task \ --action-names sqs:SendMessage \ --resource-arns arn:aws:sqs:eu-west-1:123456789012:orders.fifo → EvalDecision: allowed | denied | explicitDeny $ aws iam generate-service-last-accessed-details \ --arn arn:aws:iam::123456789012:role/order-service-task $ aws accessanalyzer validate-policy --policy-type IDENTITY_POLICY \ --policy-document file://policy.json
Run aws iam simulate-principal-policy in CI when changing IAM — assert that deploy roles cannot call iam:CreateUser or s3:DeleteBucket. Treat IAM policy changes like application code: PR review + automated checks.
Permission boundaries do not apply to the principal that creates them — they're set by an admin on a delegated role. SCPs apply to all principals in an OU including the management account (with exceptions). Know the difference: SCP = org-wide guardrail; boundary = per-role cap; session policy = per-assume-role session cap.
Identity federation
Humans and CI pipelines should never hold long-lived AWS access keys. Federation exchanges IdP tokens for temporary AWS credentials via STS — same roles and policies as always, but no keys to leak.
| Mechanism | Use case | How it works |
|---|---|---|
| SAML 2.0 | Corporate IdP (Okta, AD FS, Azure AD) | User logs in via IdP → SAML assertion → AssumeRoleWithSAML |
| OIDC | GitHub Actions, GitLab CI, Spacelift | OIDC token from provider → AssumeRoleWithWebIdentity — no stored AWS keys in GitHub |
| IAM Identity Center | Multi-account human access (replaces SSO) | Central portal → permission sets → temporary role in target account |
| Cognito User Pools | App user authentication (sign-up/sign-in) | JWT for your app — not the same as AWS console access |
| Cognito Identity Pools | Mobile/web apps needing AWS credentials | Federated identity → temporary AWS creds scoped by IAM role mapping |
GitHub Actions → AWS (OIDC) — the modern CI pattern
# 1. Create OIDC provider (once per account)
aws iam create-open-id-connect-provider \
--url https://token.actions.githubusercontent.com \
--client-id-list sts.amazonaws.com \
--thumbprint-list 6938fd4d98bab03faadb97b34396831e3780aea1
# 2. Role trust policy — only repo main branch can assume
# "Condition": {
# "StringEquals": { "token.actions.githubusercontent.com:aud": "sts.amazonaws.com" },
# "StringLike": { "token.actions.githubusercontent.com:sub": "repo:myorg/my-app:ref:refs/heads/main" }
# }
resource "aws_iam_openid_connect_provider" "github" {
url = "https://token.actions.githubusercontent.com"
client_id_list = ["sts.amazonaws.com"]
thumbprint_list = ["6938fd4d98bab03faadb97b34396831e3780aea1"]
}
resource "aws_iam_role" "github_deploy" {
name = "github-actions-deploy"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Action = "sts:AssumeRoleWithWebIdentity"
Principal = {
Federated = aws_iam_openid_connect_provider.github.arn
}
Condition = {
StringEquals = {
"token.actions.githubusercontent.com:aud" = "sts.amazonaws.com"
}
StringLike = {
"token.actions.githubusercontent.com:sub" = "repo:myorg/my-app:ref:refs/heads/main"
}
}
}]
})
}
import * as iam from 'aws-cdk-lib/aws-iam';
const provider = new iam.OpenIdConnectProvider(this, 'GitHubOidc', {
url: 'https://token.actions.githubusercontent.com',
clientIds: ['sts.amazonaws.com'],
});
new iam.Role(this, 'GitHubDeployRole', {
assumedBy: new iam.WebIdentityPrincipal(provider.openIdConnectProviderArn, {
StringEquals: {
'token.actions.githubusercontent.com:aud': 'sts.amazonaws.com',
},
StringLike: {
'token.actions.githubusercontent.com:sub': 'repo:myorg/my-app:ref:refs/heads/main',
},
}),
managedPolicies: [iam.ManagedPolicy.fromAwsManagedPolicyName('AmazonEC2ContainerRegistryPowerUser')],
});
IAM Identity Center (AWS SSO)
For human access across multiple accounts: connect Okta/Azure AD once, define permission sets (templates of IAM policies), assign to groups per account. Engineers get 1–12 hour sessions via the SSO portal — no IAM users, no access keys. This is the AWS-recommended enterprise pattern and appears frequently on the SA Pro exam.
IAM users vs Identity Center: IAM users scale poorly (no central offboarding, key rotation burden). Identity Center adds setup complexity but gives one place to revoke access when someone leaves. Exception: break-glass IAM user with MFA in a sealed envelope — max two, never used in normal operations.
IAM security best practices
A checklist distilled from AWS Well-Architected Security pillar, incident post-mortems, and SA exam rubrics. If you implement only this section, you'll be ahead of most production accounts.
-
Root account
MFA enabled (hardware key). Access keys deleted. Never used for daily work. Email goes to a group alias, not one person.
-
Break-glass
Separate procedure for emergency root use — documented, audited, requires two people. Credentials in physical safe.
-
SCPs everywhere
Deny leaving org, deny disabling CloudTrail, deny unapproved regions, deny root API calls except from break-glass IP.
-
CloudTrail all regions
Multi-region trail, log file validation, deliver to security account S3 with MFA delete. Alert on StopLogging.
-
GuardDuty on
Detects credential exfiltration, unusual API geography, crypto mining, Tor activity. Enable org-wide from admin account.
-
Roles not keys
EC2/ECS/Lambda use roles. CI uses OIDC. Never hardcode keys in application.properties or GitHub secrets long-term.
Credential hygiene
- Rotate IAM access keys < 90 days if they must exist; prefer elimination over rotation
- Alert on access keys unused > 45 days (Config rule or Access Analyzer)
- Deny iam:CreateAccessKey for human users via SCP
- Use aws:MultiFactorAuthPresent for sensitive console/API operations
Example SCP: deny risky actions org-wide
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DenyLeaveOrg",
"Effect": "Deny",
"Action": "organizations:LeaveOrganization",
"Resource": "*"
},
{
"Sid": "DenyDisableAudit",
"Effect": "Deny",
"Action": [
"cloudtrail:DeleteTrail",
"cloudtrail:StopLogging",
"guardduty:DeleteDetector",
"guardduty:DisassociateFromMasterAccount"
],
"Resource": "*"
},
{
"Sid": "DenyUnapprovedRegions",
"Effect": "Deny",
"NotAction": [
"iam:*", "organizations:*", "route53:*", "support:*",
"budgets:*", "ce:*", "cloudfront:*", "globalaccelerator:*"
],
"Resource": "*",
"Condition": {
"StringNotEquals": {
"aws:RequestedRegion": ["eu-west-1", "us-east-1"]
}
}
}
]
}
Storing AWS_ACCESS_KEY_ID in Spring application.yml committed to Git — even a private repo. Use ECS task role + default credential chain (DefaultCredentialsProvider in AWS SDK v2). For local dev, use aws sso login profiles, never production keys on laptops.
Airbnb and Slack run multi-account AWS Organizations with SCPs denying public resource creation at the org level — even if a developer attaches s3:PutBucketPublicAccessBlock incorrectly, the SCP layer catches it. Defense in depth: SCP + resource policy + Block Public Access settings.
When the exam asks "how to prevent account X from doing Y across all accounts," the answer is almost always SCP on an OU, not IAM policy on individual users. When it asks about limiting what a delegated admin can grant, answer permission boundary. When it asks about CI without long-lived keys, answer OIDC + AssumeRoleWithWebIdentity.