Custom service to service authentication using IAM/STS

Lately I’ve wanted to be able to use IAM directly for authentication. Specifically, I wanted a way for a service to be able to verify that a request from another service was from a particular IAM role and that the request’s auth is still valid. I want this because I want to avoid the chicken and egg problem of bootstrapping auth systems in the cloud. You need some secret on your instances that allow it to talk to an auth system to do authentication between services. Amazon already provides this in the form of IAM roles. Unfortunately, only AWS services have access to verify these credentials. However, it’s possible to abuse IAM and STS to achieve this goal.

DynamoDB offers fine-grained IAM resource protection. If we have a table with a string hash key of role_name, we can allow a service with the role example-production-iad to access its item in the table using IAM policy attached to the example-production-iad role. We can pass the role credentials from example-production-iad into another service, which would allow the other service to fetch the item using the passed-in credentials. By doing so we can verify that the service making the request is example-production-iad, since we’re limiting access to that item to example-production-iad.

Of course it’s insecure to pass the IAM credentials from services into other services, since it means that the other service has the entire permission set of the role being passed in. Thankfully, we can limit the scope of an assumed role as much as we want by passing in the policy we want to limit the token to.

First modify the role’s trust relationships:

{
  "Version": "2008-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com",
        "AWS": "arn:aws:iam::12345:role/example-production-iad"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Then give the service access to read it’s own item in DynamoDB:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "dynamodb:GetItem"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:dynamodb:us-east-1:12345:table/authnz"
            ],
            "Condition": {
                "ForAllValues:StringEquals": {
                    "dynamodb:LeadingKeys": [
                        "example-production-iad"
                    ],
                    "dynamodb:Attributes": [
                        "role_name"
                    ]
                },
                "StringEquals": {
                    "dynamodb:Select": "SPECIFIC_ATTRIBUTES"
                }
            }
        }
    ]
}

Note that the above limits the returned attributes to the id attribute, which is also the hash key, meaning the service can’t get anything back that it isn’t already sending in.

Now, we can limit the scope of the token by assuming the role:

import boto
import boto.sts

conn = boto.sts.connect_to_region('us-east-1')
role = conn.assume_role("arn:aws:iam::12345:role/example-production-iad", 'auth', policy='{"Version":"2012-10-17","Statement":[{"Sid":"example1234","Effect":"Allow","Action":["dynamodb:GetItem"],"Condition":{"ForAllValues:StringEquals":{"dynamodb:LeadingKeys":"example-production-iad","dynamodb:Attributes":["role_name"]},"StringEquals":{"dynamodb:Select":"SPECIFIC_ATTRIBUTES"}},"Resource":["arn:aws:dynamodb:us-east-1:12345:table/authnz"]}]}')

These role credentials are now only allowed to get the role_name field of its own item in the DynamoDB table.

There’s a couple problems with this so far:

  1. This solution allows the target service to authenticate as example-production-iad to other services, since the role doesn’t have any information about the scope of the token (from service/to service).
  2. If you allow a role to assume itself, the assumed role credentials can be used to assume the role, which lets you extend the lifetime of the token via another token, which you can use to get another token which has another extended lifetime… you can probably see where this is going:
>>> conn = boto.sts.connect_to_region('us-east-1')
>>> role = conn.assume_role("arn:aws:iam::12345:role/example-production-iad", 'auth', policy='{"Version":"2012-10-17","Statement":[{"Sid":"example1234","Effect":"Allow","Action":["dynamodb:GetItem"],"Condition":{"ForAllValues:StringEquals":{"dynamodb:LeadingKeys":"example-production-iad","dynamodb:Attributes":["role_name"]},"StringEquals":{"dynamodb:Select":"SPECIFIC_ATTRIBUTES"}},"Resource":["arn:aws:dynamodb:us-east-1:1234:table/authz"]}]}', duration_seconds=900)
>>> 
>>> conn2 = boto.sts.connect_to_region('us-east-1', aws_access_key_id=role.credentials.access_key, aws_secret_access_key=role.credentials.secret_key, security_token=role.credentials.session_token)
>>> role2 = conn.assume_role("arn:aws:iam::12345:role/example-production-iad", 'auth', policy='{"Version":"2012-10-17","Statement":[{"Sid":"example1234","Effect":"Allow","Action":["dynamodb:GetItem"],"Condition":{"ForAllValues:StringEquals":{"dynamodb:LeadingKeys":"example-production-iad","dynamodb:Attributes":["role_name"]},"StringEquals":{"dynamodb:Select":"SPECIFIC_ATTRIBUTES"}},"Resource":["arn:aws:dynamodb:us-east-1:12345:table/authnz"]}]}', duration_seconds=900)
>>> role.credentials.expiration
u'2015-04-30T18:47:23Z'
>>> role2.credentials.expiration
u'2015-04-30T18:52:43Z'

As you can see, once you assume a role, you can use the currently unexpired credentials to re-assume the role which gives you a new set of credentials that expire in the future. Once you assume a role, you can assume it forever. This isn’t any better than using IAM users (it’s probably much, much worse, in fact).

Let’s make some changes to make this more secure. First, let’s add an example-production-iad-auth role that the example-production-iad role can assume. This solves problem #2, since we won’t let the auth role assume itself or other roles. We can also limit the scope of this role directly on the role’s policy, rather than having to limit the scope when assuming the role.

Next, let’s change up the dynamo table to add some scoping data. Rather than just having a primary hash key of the role’s name, let’s also add from and to fields, which will contain sets of role names. Now let’s modify the IAM policy for example-production-iad-auth:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "dynamodb:GetItem"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:dynamodb:us-east-1:12345:table/authnz"
            ],
            "Condition": {
                "ForAllValues:StringEquals": {
                    "dynamodb:LeadingKeys": [
                        "example-production-iad"
                    ],
                    "dynamodb:Attributes": [
                        "role_name",
                        "from"
                    ]
                },
                "StringEquals": {
                    "dynamodb:Select": "SPECIFIC_ATTRIBUTES"
                }
            },
            "Action": [
                "dynamodb:GetItem"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:dynamodb:us-east-1:12345:table/authnz"
            ],
            "Condition": {
                "ForAllValues:StringEquals": {
                    "dynamodb:LeadingKeys": [
                        "targetservice-production-iad",
                        "targetservice2-production-iad"
                    ],
                    "dynamodb:Attributes": [
                        "role_name",
                        "to"
                    ]
                },
                "StringEquals": {
                    "dynamodb:Select": "SPECIFIC_ATTRIBUTES"
                }
            }
        }
    ]
}

Now we’re allowing example-production-iad-auth to get the example-production-iad and targetservice-production-iad’s items, but we limit access to the from and to fields respectively. This lets us scope the assumed role from example-production-iad to targetservice-production-iad. Notice that we’re allowing access from example-production-iad to multiple target services. Based on this, we’ll want to again use STS’s scope limiting functionality to limit the assumed role’s scope from example-production-iad to whichever service it’s authenticating to.

Note that we also have a nice feature added by making the to and from on the item’s lists of iam role names. When we fetch the item we can examine the returned from field to ensure the sender is still in the list, letting us immediately revoke access from a service.

There’s still a few more problems with this solution:

  1. We can’t have future scoped auth tokens. STS only supports assumed roles that are created now and expire within 15 minutes to 1 hour. This is a hard limitation at this point in time. For things like far in the future enqueued callback jobs, this is problematic.
  2. We have to create two roles for every service. Assuming 100 services and 3 environments, that’s a lot of roles to maintain (though with good orchestration, that’s not a major problem).
  3. For every auth we’re doing a batch-get of two dynamodb items. This could get expensive quick. We could cache a role for the duration of its expiration, but only the caller knows this. You can’t lookup the expiration of a token. At best we can cache this for whatever period of time we’re willing to accept the risk for.

The nicest thing about this solution is that it has built-in key rotation. STS assumed roles are limited to a minimum lifetime of 15 minutes and a maximum of 1 hour, there’s no rate limiting on assuming roles and there’s no cost associated with assuming roles.

It’s of course possible to extend this concept to authorization as well. We can add a policy field to our dynamo items. When a service authenticates a request, it can also get back a set of actions the requester is allowed to perform on which resources. Basically we’re extending STS and IAM completely at that point.

One thing you may be asking yourself is: do we actually need to use DynamoDB here? If we’re doing authentication by making a call to a resource protected by fine grained access policy, then we can use any service that supports this, including S3, which is incredibly cheap for calls and has relatively high rate limits. The biggest reason for using DynamoDB is that you get reliable performance guarantees and fairly low latency. S3’s latency is often quite high and the latency is also completely unpredictable. If you’re needing to auth every request, it’s necessary to know how much latency overhead you’re adding.

This auth experiment was something I did as a hackathon project over a couple days. Since then I’ve found a much better method of accomplishing service to service authentication using AWS, and that’s by abusing KMS. I’ll get into that in my next blog post.

Leave a Reply

Your email address will not be published. Required fields are marked *