IAM: Identity & Access Management

IAM is the most important AWS service because it gates every other service. Get IAM wrong and nothing else matters — public S3 buckets, leaked access keys, overprivileged Lambda roles, and cross-account trust misconfigurations are all IAM failures. This chapter covers how AWS evaluates every API call, how to write policies that are tight enough to pass audit but loose enough to deploy, and how production teams grant access without long-lived credentials.

developer devops architect Global service Free

IAM fundamentals

Everything in AWS is an API call. IAM decides whether that call is allowed. There is no separate "login server" per service — one IAM engine evaluates all requests, whether they come from the console, CLI, SDK, Terraform, or CDK.

Everything is an API call

When you click "Create bucket" in the console, the browser sends s3:CreateBucket to the S3 control plane. When your Spring Boot app on ECS reads a secret, the task role credentials sign secretsmanager:GetSecretValue. When Terraform runs aws ec2 describe-instances, the IAM user or role behind the provider must have ec2:DescribeInstances. Same engine, same evaluation order, every time.

IAM is global

IAM users, groups, roles, and policies exist once per account, not per region. You create a role in IAM once and attach it to EC2 instances in any region. The exception: some actions are region-scoped in policies via condition keys like aws:RequestedRegion, but the IAM entities themselves are global. Route53 and CloudFront are also global services; most others are regional.

Core components

Component What it is Production guidance
Users Long-term identity with optional access keys and console password Avoid for humans — use SSO. OK for legacy CI with rotation; prefer OIDC roles
Groups Collection of users; policies attach to the group Map SSO groups to permission sets instead of IAM groups when possible
Roles Temporary credentials via sts:AssumeRole Default for everything — EC2, ECS, Lambda, cross-account, CI/CD
Policies JSON documents: Effect, Action, Resource, Condition Customer-managed for app-specific; AWS managed to bootstrap, then narrow
Identity providers SAML/OIDC/OAuth trust for federation Corporate IdP → IAM Identity Center; GitHub → OIDC provider for Actions
🔬 Under the Hood

IAM is a regional service with a global endpoint. Policy data replicates globally within an account, but evaluation happens at the edge of each AWS API. When you assume a role, STS returns temporary credentials (access key + secret + session token) valid for 15 minutes to 12 hours — the session token is what makes them temporary; without it, the access key alone is useless.

💰 Cost

IAM itself is free. You pay nothing for users, roles, or policies. Costs appear indirectly: overly broad s3:* policies enable data exfiltration; missing SCPs enable crypto-mining in rogue regions. IAM Access Analyzer and IAM Identity Center have their own pricing tiers — Access Analyzer is free for external access findings; policy validation is included.

Policy evaluation logic

AWS evaluates policies in a fixed order. One explicit Deny anywhere overrides any number of Allow statements. If nothing allows the action, the default is implicit deny — the request fails.

flowchart TD
  START["Incoming API request\n(principal + action + resource)"]
  START --> EXPLICIT{"Explicit Deny\nin any policy?"}
  EXPLICIT -->|Yes| DENY["❌ DENY"]
  EXPLICIT -->|No| SCP{"SCP allows?\n(org guardrail)"}
  SCP -->|No| DENY
  SCP -->|Yes| RP{"Resource-based policy\nallows?"}
  RP -->|Explicit Deny| DENY
  RP -->|Allow| ALLOW["✅ ALLOW"]
  RP -->|No match| ID{"Identity-based policies\n(user/role/group) allow?"}
  ID -->|Explicit Deny| DENY
  ID -->|Allow| PB{"Permission boundary\nallows?"}
  ID -->|No Allow| IMPLICIT["❌ Implicit DENY"]
  PB -->|No| DENY
  PB -->|Yes| SP{"Session policy\n(AssumeRole) allows?"}
  SP -->|No| DENY
  SP -->|Yes| ALLOW

Evaluation order (simplified — exam-critical):

  1. Explicit Deny — any policy, any type → immediate deny
  2. SCPs — organization guardrails; cannot grant permissions, only deny or pass-through
  3. Resource-based policies — S3 bucket policy, KMS key policy, SQS queue policy, Lambda resource policy
  4. Identity-based policies — attached to user, group, or role
  5. Permission boundaries — cap maximum permissions even if identity policies allow more
  6. Session policies — passed during AssumeRole to further restrict the session
  7. Implicit deny — default if no Allow matched
🎯 Exam Tip

SCPs never grant permissions — they only filter what identity policies can grant. A common trap: "We attached PowerUserAccess SCP to the OU" — SCPs don't work that way. Also: resource policies can grant cross-account access without identity policies (S3 bucket policy allowing another account's role) — both sides may need to allow the action.

⚠️ Pitfall

Attaching AdministratorAccess to a role "temporarily for debugging" and forgetting to remove it. Combined with a permissive trust policy ("Principal": {"AWS": "*"}), you've created a persistent backdoor. Use aws sts assume-role with MFA and time-bounded break-glass roles instead.

IAM policy deep dive

Policies are JSON. Master the structure once and you can read any AWS permission document — IAM policies, S3 bucket policies, KMS key policies, and SCPs all share the same grammar.

Policy structure

json
{
  "Version": "2012-10-17",
  "Statement": [{
    "Sid": "AllowReadOwnBucketPrefix",
    "Effect": "Allow",
    "Action": ["s3:GetObject", "s3:ListBucket"],
    "Resource": [
      "arn:aws:s3:::my-app-artifacts",
      "arn:aws:s3:::my-app-artifacts/${aws:username}/*"
    ],
    "Condition": {
      "Bool": { "aws:SecureTransport": "true" },
      "StringEquals": { "aws:RequestedRegion": "eu-west-1" }
    }
  }]
}
Element Purpose
VersionAlways 2012-10-17 for new policies
EffectAllow or Deny
ActionAPI operations — service:Operation or wildcards
ResourceARN(s) the action applies to; omit for IAM actions that aren't resource-scoped
ConditionOptional constraints — IP, MFA, tags, encryption headers, source VPC
PrincipalOnly in resource-based policies — who is allowed/denied

Wildcards — when to use and when to avoid

  • s3:Get* — acceptable during prototyping; replace with explicit actions in production
  • s3:* on arn:aws:s3:::* — almost never acceptable for application roles
  • ec2:Describe* — common for read-only ops roles; low risk (no mutations)
  • ? — single-character wildcard in some resource patterns; rarely needed

Condition operators and keys

Operator Use case Example key
StringEqualsExact matchaws:RequestedRegion, aws:PrincipalTag/Team
StringLikePattern match with *s3:prefix = uploads/${aws:username}/*
IpAddressRestrict to CIDRaws:SourceIp — corporate VPN only
DateLessThanTime-bound accessaws:CurrentTime — contractor expiry
BoolTrue/false flagsaws:SecureTransport = true (HTTPS only)
ArnLikeARN patternaws:SourceArn — Lambda can only be invoked by this SNS topic
NumericLessThanThresholdss3:max-keys — limit list operations

High-value condition keys for backend engineers:

  • aws:MultiFactorAuthPresent — require MFA for sensitive operations
  • aws:CalledVia — restrict which service may call on your behalf (confused deputy prevention)
  • aws:SourceVpc / aws:SourceVpce — only from your VPC or endpoint
  • ec2:Region — legacy; prefer aws:RequestedRegion

Policy variables

Dynamic policies use ${aws:username}, ${aws:userid}, ${aws:PrincipalTag/key} in Resource or Condition values — one policy template for all developers instead of one policy per person.

NotAction and NotResource

NotAction means "everything except these actions" — e.g. deny all except s3:GetObject. Powerful and dangerous: easy to accidentally allow more than intended. Use explicit Allow lists instead when possible.

Inline vs managed policies

  • AWS managedAmazonS3ReadOnlyAccess; maintained by AWS, good starting point
  • Customer managed — your reusable policies; versioned, attachable to multiple roles, reviewable in PRs
  • Attach up to 10 managed policies per user/role/group (quota increaseable)
  • Embedded directly on one user/role/group — deleted when entity is deleted
  • No versioning, no reuse, hard to audit across accounts
  • Acceptable only for generated one-off break-glass policies

Example: least-privilege S3 read for a Spring service role

saved globally
bash
cat > /tmp/order-service-s3-policy.json <<'EOF'
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": ["s3:GetObject", "s3:PutObject"],
    "Resource": "arn:aws:s3:::my-app-artifacts/orders/*",
    "Condition": { "Bool": { "aws:SecureTransport": "true" } }
  }]
}
EOF

aws iam create-policy --policy-name OrderServiceS3Access \
  --policy-document file:///tmp/order-service-s3-policy.json

aws iam attach-role-policy --role-name order-service-ecs-task \
  --policy-arn arn:aws:iam::123456789012:policy/OrderServiceS3Access
hcl
resource "aws_iam_policy" "order_service_s3" {
  name = "OrderServiceS3Access"
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect   = "Allow"
      Action   = ["s3:GetObject", "s3:PutObject"]
      Resource = "${aws_s3_bucket.artifacts.arn}/orders/*"
      Condition = {
        Bool = { "aws:SecureTransport" = "true" }
      }
    }]
  })
}

resource "aws_iam_role_policy_attachment" "order_service" {
  role       = aws_iam_role.ecs_task.name
  policy_arn = aws_iam_policy.order_service_s3.arn
}
typescript
import * as iam from 'aws-cdk-lib/aws-iam';
import * as s3 from 'aws-cdk-lib/aws-s3';

const bucket = s3.Bucket.fromBucketName(this, 'Artifacts', 'my-app-artifacts');

taskRole.addToPolicy(new iam.PolicyStatement({
  effect: iam.Effect.ALLOW,
  actions: ['s3:GetObject', 's3:PutObject'],
  resources: [`${bucket.bucketArn}/orders/*`],
  conditions: { Bool: { 'aws:SecureTransport': 'true' } },
}));
🔒 Security

Enforce HTTPS on S3 with aws:SecureTransport in the identity policy and a bucket policy deny when the condition is false — defense in depth. Pair with bucket policy denying uploads without x-amz-server-side-encryption.

IAM roles — the right way to grant access

Never use long-term access keys if a role can be used. Roles provide temporary credentials automatically rotated by AWS. Every compute service on AWS is designed around roles — fighting that design means credential sprawl and 3 AM incidents.

Trust policy vs permissions policy

Every role has two documents: a trust policy (who can assume the role) and one or more permissions policies (what the role can do once assumed). Confusing them is the #1 IAM mistake — granting s3:* in the trust policy does nothing; putting ec2.amazonaws.com in a permissions policy does nothing.

Pattern Trust principal Typical permissions
EC2 instance profile ec2.amazonaws.com S3, SSM, CloudWatch — app needs on the instance
ECS task role ecs-tasks.amazonaws.com Per-service: RDS via proxy, SQS publish, Secrets Manager — not shared with the EC2 host role
Lambda execution role lambda.amazonaws.com CloudWatch Logs + exactly what the handler calls — avoid AdministratorAccess
Cross-account role Account B root or specific role ARN + sts:ExternalId Read-only audit, shared services, CI deploy to prod account
Service-linked role Pre-created by AWS service Auto-managed; don't delete — breaks the service (e.g. AWSServiceRoleForECS)

ECS task role vs EC2 instance role

On ECS with EC2 launch type, the container instance has an instance profile (pull images from ECR, send logs). Each task gets its own task role injected as environment credentials — your Spring service gets only the S3/DynamoDB permissions it needs, not the host's ECR pull permissions. On Fargate, there is no instance profile — only the task role matters.

Cross-account access

Account A assumes a role in Account B: B's role trust policy allows A's principal; B's permissions policy grants the actions; A's identity may also need sts:AssumeRole on the role ARN. Use External ID when a third party assumes into your account — prevents confused deputy attacks where another customer tricks the third party into assuming your role.

Role chaining

Assume role A, then from A assume role B — maximum session duration 1 hour for chained roles. Useful for hub-and-spoke access patterns; adds latency (two STS calls). Prefer direct trust where possible.

Create an ECS task role (production pattern)

saved globally
bash
aws iam create-role --role-name order-service-task \
  --assume-role-policy-document '{
    "Version": "2012-10-17",
    "Statement": [{
      "Effect": "Allow",
      "Principal": { "Service": "ecs-tasks.amazonaws.com" },
      "Action": "sts:AssumeRole"
    }]
  }'

# Reference in task definition:
# "taskRoleArn": "arn:aws:iam::123456789012:role/order-service-task"
hcl
resource "aws_iam_role" "ecs_task" {
  name = "order-service-task"
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect    = "Allow"
      Action    = "sts:AssumeRole"
      Principal = { Service = "ecs-tasks.amazonaws.com" }
    }]
  })
}

resource "aws_ecs_task_definition" "app" {
  family                   = "order-service"
  task_role_arn            = aws_iam_role.ecs_task.arn
  execution_role_arn       = aws_iam_role.ecs_execution.arn
  # ...
}
typescript
import * as ecs from 'aws-cdk-lib/aws-ecs';
import * as iam from 'aws-cdk-lib/aws-iam';

const taskRole = new iam.Role(this, 'OrderServiceTaskRole', {
  assumedBy: new iam.ServicePrincipal('ecs-tasks.amazonaws.com'),
  description: 'Runtime permissions for order-service containers',
});

new ecs.FargateTaskDefinition(this, 'TaskDef', {
  taskRole,
  // executionRole for ECR pull + logs — separate, tighter role
});
📦 Real World

Netflix pioneered cross-account IAM roles for their microservices — each service assumes a role scoped to its dataset, with no long-lived keys on instances. Stripe uses strict role boundaries between PCI and non-PCI accounts. Both treat IAM roles as the only credential mechanism in production.

⚖️ Trade-off

One role per service vs shared role: shared roles simplify ops but violate least privilege when services diverge (payments needs KMS, catalog doesn't). Start with one role per ECS service / Lambda function; merge only when permissions are identical and lifecycle is identical.

Least privilege in practice

Least privilege isn't a one-time policy write — it's a continuous loop: deploy with broader AWS managed policies, observe actual usage in CloudTrail, narrow to customer-managed policies, repeat. Tools exist at every step.

The narrowing workflow

  1. Start with AWS managed policy closest to need (e.g. AmazonDynamoDBFullAccess in dev only)
  2. Enable CloudTrail in all regions; log to a security account
  3. Run IAM Access Advisor — see last-accessed services per user/role; remove unused actions
  4. Use IAM Access Analyzer — detect resources shared externally; validate policies before deploy
  5. Generate policies from CloudTrail with Access Analyzer policy generator (last 90 days of actual API calls)
  6. Replace with customer-managed policy; attach permission boundary for delegated admin teams

Permission boundaries

A permission boundary caps what a role/user can do even if attached policies allow more. Use case: let a product team create their own roles in CI, but cap them with a boundary that denies iam:*, organizations:*, and s3:DeleteBucket. The effective permission is the intersection of boundary and identity policies.

CloudTrail → Athena: find what a role actually used

sql
-- Run against CloudTrail logs table in Athena
SELECT eventSource, eventName, count(*) AS calls
FROM cloudtrail_logs
WHERE userIdentity.arn LIKE '%order-service-task%'
  AND eventTime > date_add('day', -90, current_timestamp)
GROUP BY eventSource, eventName
ORDER BY calls DESC
LIMIT 50;
terminal — simulate and audit permissions
$ # Simulate: can this role publish to our queue?
$ aws iam simulate-principal-policy \
  --policy-source-arn arn:aws:iam::123456789012:role/order-service-task \
  --action-names sqs:SendMessage \
  --resource-arns arn:aws:sqs:eu-west-1:123456789012:orders.fifo
→ EvalDecision: allowed | denied | explicitDeny
$ aws iam generate-service-last-accessed-details \
  --arn arn:aws:iam::123456789012:role/order-service-task
$ aws accessanalyzer validate-policy --policy-type IDENTITY_POLICY \
  --policy-document file://policy.json
💡 Pro Tip

Run aws iam simulate-principal-policy in CI when changing IAM — assert that deploy roles cannot call iam:CreateUser or s3:DeleteBucket. Treat IAM policy changes like application code: PR review + automated checks.

🎯 Exam Tip

Permission boundaries do not apply to the principal that creates them — they're set by an admin on a delegated role. SCPs apply to all principals in an OU including the management account (with exceptions). Know the difference: SCP = org-wide guardrail; boundary = per-role cap; session policy = per-assume-role session cap.

Identity federation

Humans and CI pipelines should never hold long-lived AWS access keys. Federation exchanges IdP tokens for temporary AWS credentials via STS — same roles and policies as always, but no keys to leak.

Mechanism Use case How it works
SAML 2.0 Corporate IdP (Okta, AD FS, Azure AD) User logs in via IdP → SAML assertion → AssumeRoleWithSAML
OIDC GitHub Actions, GitLab CI, Spacelift OIDC token from provider → AssumeRoleWithWebIdentity — no stored AWS keys in GitHub
IAM Identity Center Multi-account human access (replaces SSO) Central portal → permission sets → temporary role in target account
Cognito User Pools App user authentication (sign-up/sign-in) JWT for your app — not the same as AWS console access
Cognito Identity Pools Mobile/web apps needing AWS credentials Federated identity → temporary AWS creds scoped by IAM role mapping

GitHub Actions → AWS (OIDC) — the modern CI pattern

saved globally
bash
# 1. Create OIDC provider (once per account)
aws iam create-open-id-connect-provider \
  --url https://token.actions.githubusercontent.com \
  --client-id-list sts.amazonaws.com \
  --thumbprint-list 6938fd4d98bab03faadb97b34396831e3780aea1

# 2. Role trust policy — only repo main branch can assume
# "Condition": {
#   "StringEquals": { "token.actions.githubusercontent.com:aud": "sts.amazonaws.com" },
#   "StringLike": { "token.actions.githubusercontent.com:sub": "repo:myorg/my-app:ref:refs/heads/main" }
# }
hcl
resource "aws_iam_openid_connect_provider" "github" {
  url             = "https://token.actions.githubusercontent.com"
  client_id_list  = ["sts.amazonaws.com"]
  thumbprint_list = ["6938fd4d98bab03faadb97b34396831e3780aea1"]
}

resource "aws_iam_role" "github_deploy" {
  name = "github-actions-deploy"
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect = "Allow"
      Action = "sts:AssumeRoleWithWebIdentity"
      Principal = {
        Federated = aws_iam_openid_connect_provider.github.arn
      }
      Condition = {
        StringEquals = {
          "token.actions.githubusercontent.com:aud" = "sts.amazonaws.com"
        }
        StringLike = {
          "token.actions.githubusercontent.com:sub" = "repo:myorg/my-app:ref:refs/heads/main"
        }
      }
    }]
  })
}
typescript
import * as iam from 'aws-cdk-lib/aws-iam';

const provider = new iam.OpenIdConnectProvider(this, 'GitHubOidc', {
  url: 'https://token.actions.githubusercontent.com',
  clientIds: ['sts.amazonaws.com'],
});

new iam.Role(this, 'GitHubDeployRole', {
  assumedBy: new iam.WebIdentityPrincipal(provider.openIdConnectProviderArn, {
    StringEquals: {
      'token.actions.githubusercontent.com:aud': 'sts.amazonaws.com',
    },
    StringLike: {
      'token.actions.githubusercontent.com:sub': 'repo:myorg/my-app:ref:refs/heads/main',
    },
  }),
  managedPolicies: [iam.ManagedPolicy.fromAwsManagedPolicyName('AmazonEC2ContainerRegistryPowerUser')],
});

IAM Identity Center (AWS SSO)

For human access across multiple accounts: connect Okta/Azure AD once, define permission sets (templates of IAM policies), assign to groups per account. Engineers get 1–12 hour sessions via the SSO portal — no IAM users, no access keys. This is the AWS-recommended enterprise pattern and appears frequently on the SA Pro exam.

⚖️ Trade-off

IAM users vs Identity Center: IAM users scale poorly (no central offboarding, key rotation burden). Identity Center adds setup complexity but gives one place to revoke access when someone leaves. Exception: break-glass IAM user with MFA in a sealed envelope — max two, never used in normal operations.

IAM security best practices

A checklist distilled from AWS Well-Architected Security pillar, incident post-mortems, and SA exam rubrics. If you implement only this section, you'll be ahead of most production accounts.

  • Root account

    MFA enabled (hardware key). Access keys deleted. Never used for daily work. Email goes to a group alias, not one person.

  • Break-glass

    Separate procedure for emergency root use — documented, audited, requires two people. Credentials in physical safe.

  • SCPs everywhere

    Deny leaving org, deny disabling CloudTrail, deny unapproved regions, deny root API calls except from break-glass IP.

  • CloudTrail all regions

    Multi-region trail, log file validation, deliver to security account S3 with MFA delete. Alert on StopLogging.

  • GuardDuty on

    Detects credential exfiltration, unusual API geography, crypto mining, Tor activity. Enable org-wide from admin account.

  • Roles not keys

    EC2/ECS/Lambda use roles. CI uses OIDC. Never hardcode keys in application.properties or GitHub secrets long-term.

Credential hygiene

  • Rotate IAM access keys < 90 days if they must exist; prefer elimination over rotation
  • Alert on access keys unused > 45 days (Config rule or Access Analyzer)
  • Deny iam:CreateAccessKey for human users via SCP
  • Use aws:MultiFactorAuthPresent for sensitive console/API operations

Example SCP: deny risky actions org-wide

json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyLeaveOrg",
      "Effect": "Deny",
      "Action": "organizations:LeaveOrganization",
      "Resource": "*"
    },
    {
      "Sid": "DenyDisableAudit",
      "Effect": "Deny",
      "Action": [
        "cloudtrail:DeleteTrail",
        "cloudtrail:StopLogging",
        "guardduty:DeleteDetector",
        "guardduty:DisassociateFromMasterAccount"
      ],
      "Resource": "*"
    },
    {
      "Sid": "DenyUnapprovedRegions",
      "Effect": "Deny",
      "NotAction": [
        "iam:*", "organizations:*", "route53:*", "support:*",
        "budgets:*", "ce:*", "cloudfront:*", "globalaccelerator:*"
      ],
      "Resource": "*",
      "Condition": {
        "StringNotEquals": {
          "aws:RequestedRegion": ["eu-west-1", "us-east-1"]
        }
      }
    }
  ]
}
⚠️ Pitfall

Storing AWS_ACCESS_KEY_ID in Spring application.yml committed to Git — even a private repo. Use ECS task role + default credential chain (DefaultCredentialsProvider in AWS SDK v2). For local dev, use aws sso login profiles, never production keys on laptops.

📦 Real World

Airbnb and Slack run multi-account AWS Organizations with SCPs denying public resource creation at the org level — even if a developer attaches s3:PutBucketPublicAccessBlock incorrectly, the SCP layer catches it. Defense in depth: SCP + resource policy + Block Public Access settings.

🎯 Exam Tip

When the exam asks "how to prevent account X from doing Y across all accounts," the answer is almost always SCP on an OU, not IAM policy on individual users. When it asks about limiting what a delegated admin can grant, answer permission boundary. When it asks about CI without long-lived keys, answer OIDC + AssumeRoleWithWebIdentity.