How to safely prevent your AWS users from making calls outside of your ecosystem.
Building a secure environment revolves around making incremental improvements targeted to thwart specific types of threats. A common way in which an intruder can infiltrate your network is through obtaining credentials which grant access to a restricted system. Credentials can leak in a myriad of ways, from developers accidentally committing them to source control, to services logging them in a widely accessible location, to users leaving them unencrypted on a compromised host. They could even leak through being sent over a channel incorrectly assumed to be secure.
Fortunately, there are several ways to combat these sorts of mistakes. The most common of these attempt to limit the window in which a leaked credential could be used by an attacker, either through frequently rotating credentials or by regularly scanning through source code and logs. But what if you could make the credentials effectively worthless outside of the environment in which you expect them to be used? And arguably more importantly, how can you enable such a restriction without causing pain to existing automation and users who may use their credentials in ways you hadn’t anticipated?
This guide will dive into some technical examples, so with that, we expect it to provide the most value to system administrators looking to improve their security posture.
Before starting this IAM credential lockdown, you'll need to setup the following:
Already, we’ve thrown around quite a few AWS service names. This overall strategy can be adapted to work on other popular cloud providers like Google Cloud Platform or Microsoft Azure. Some service name mappings can be found in the table below:
Amazon Web Services (AWS) |
Google Cloud Platform (GCP) |
Microsoft Azure |
Virtual Private Cloud |
Virtual Private Cloud |
Azure Virtual Network |
CloudTrail |
Cloud Audit Logs |
Azure Audit Logs |
Athena |
BigQuery |
Azure Synapse Analytics |
Identity and Access Management (IAM) |
Identity and Access Management |
Azure Identity Management |
Simple Storage Service (S3) |
Cloud Storage |
Azure Storage |
With the above satisfied, the level of effectiveness and success of this strategy will depend on the percentage of IAM users calling out resources in the same region as the request’s origin. We’ll dive more into that limitation later.
AWS allows access control to be configured through policies attached to the user making a request or through policies attached to the target resource directly. In this case, it makes the most sense to apply this restriction via a policy assigned to IAM users given that if you limit power at the source, you don’t have to worry about limiting it at every destination. Say you’re in a scenario where you have a user with the S3 buckets they have permission to manage defined through a regex. In this case, it’s much easier to apply this restriction to the user itself than to apply the same restriction on every current and future bucket matching that regex.
We’ll dive right in by introducing the most basic version of the policy we wish to enforce, then gradually add elements as we try to account for more cases.
The primary function of this policy is to deny all requests not originating from a trusted VPC. On line 11, we take advantage of the aws:sourceVpc metadata AWS passes along with requests made through a VPC endpoint. The second condition beginning on line 17 allows us to quickly and easily maintain an allowlist of users who are exempt from this policy. We simply check for the presence of a boolean tag on the caller, and if not present or explicitly set to false, we enforce the first condition as usual.1
So now we’re ready to apply this policy to every user in our account right? Not quite.
Before enforcing this rule on a given IAM user, you first want to be confident that you are not going to block an existing workflow and suddenly cause their requests to return 403s. Here’s where having a CloudTrail configured to record AWS API activity comes in handy.
To recap, CloudTrail is AWS’s audit logging service that pipes logs into S3. Athena provides a way to make queries against S3 buckets. Together, they provide a powerful way to increase visibility into the events that take place in your account. More details, including how to partition CloudTrail logs for performance and scalability can be found in these AWS docs.
Using Athena to query CloudTrail’s dataset, we can determine with a level of confidence whether a given user is likely to be making its requests exclusively through a set of VPCs. If that is the case, they are a good candidate for the policy defined above.
The table name can vary based on your CloudTrail and Athena configuration, as I mentioned earlier, but the basic idea is to query Athena for the specific fields needed to fully understand how users are calling out to AWS services:
We created an automated job to run these queries hourly against the previous hours' data, and store the results in a database. This gave us the ability to quickly answer questions like:
Current limitations of CloudTrail
At the time of this writing, CloudTrail only logs control plane calls for most services. This means that, with some exceptions like S3 for which CloudTrail logs both data and control plane actions, only a subset of calls to all other AWS services will be logged. Details on what events CloudTrail records can be found in the official documentation. In practice, this does not pose a real concern to our approach, given that if a user makes a control plane request to service A through a VPC endpoint, and their request is logged via CloudTrail, all data plane requests from the same user and client configuration can be expected to go through that same VPC endpoint for service A.
As data accumulates using the process highlighted above, the confidence in gauging how a user generally behaves will also increase. After defining a threshold at which to lock down a user, applying the policy is as easy as:
It’s essential to roll the first few users into small batches with extensive monitoring before ramping up the lockdown rate.
What would an engineering blog post be without highlighting some edge cases we came across? No doubt every organization will use different sets of AWS services in slightly different ways, but the following were some of the use cases we considered.
There are use cases in which you may want to allow IAM users belonging to one account to be used from within another. The VPC metadata passed with each request (i.e. aws:sourceVpc), with which the policy constructed above is concerned, is passed along in calls made from VPC endpoints within the same region as the target resource. In practice, it appears AWS even passes along aws:sourceVpc for calls made from one account to another, as long as they are owned by the same underlying organization and the calls are made within a single region. Due to the number of differences in how organizations manage their AWS accounts, we decided to omit solving for cross-region calls in this guide.
Developers often need to be able to run code locally that makes calls out to their AWS account. Unless they are doing so from an SSH session in an EC2 instance placed within one of their VPCs, they will be blocked by our policy in its current form. The simplest way we found to solve this was to direct our locally running AWS clients to route their requests through a proxy we created within our VPC. Properly restricting access to this proxy is essential to not introducing an unintended backdoor through this policy.
It may be desirable in some cases to allow public access to S3 objects directly. This can be accomplished through the use of presigned URLs. The current policy just needs a tweak to account for this specific type of request:
Here, we want to have different logic depending on whether the user is calling s3:getObject using presigned URL auth or not. To break down how we did this, we split our policy into these two cases — the first of which (on line 8) denies all actions except s3:GetObject, and the second (line 26) denies s3:getObject exclusively. Besides the substitution of NotAction for Action, the second statement looks for a third condition to avoid getting denied for s3:getObject calls — that is, the s3:authtype of the request being REST-QUERY-STRING which evaluates to true for presigned URL authed calls.
Through experimenting with the above policies, we found that some Athena calls were still being denied despite having a properly configured Athena VPC endpoint in the region. Upon inspecting the metadata of these requests in CloudTrail, we found that they were actually being proxied through the public Athena AWS endpoint. Simply adding the additional condition on line 9 below allowed us to avoid hitting the deny condition on calls made on behalf of Athena.
Alternatively, you may want to use the key aws:ViaAWSService with a value of true to generalize this rule to allow any calls made on behalf of another AWS service.
These safeguards alone will not magically solve all of the threats that can arise from leaked credentials. They instead act as another step that when taken, minimizes the risk credentials pose when they fall into the wrong hands.
Going beyond the strategies already discussed, it’s equally important to consider how one can limit the options for attackers inside of a set of VPCs. This policy, for example, would not block an attacker from executing calls from a compromised host with unencrypted credentials, so practicing least-privilege on your IAM users can limit the potential impact of a compromised user. One should also consider whether hosts within a VPC all need access to the public internet given that if an attacker is able to break in and access sensitive data, preventing them from exfiltrating it could make a significant difference in the severity of the impact.
While it may seem daunting to consider every vector of attack, it’s important to remember how security is a layered objective where every level built upon the next has a meaningful impact on protecting your customers and their data.
Figure 1: How several conditions and/or condition keys are evaluated. Source: the official AWS IAM policy documentation.
1As a recap of IAM policy evaluation, when multiple keys are specified under one condition, they are evaluated with logical OR, so in this case, both conditions must evaluate to false for this policy to deny the request. For more details, see Figure 1 at the end of this post, or refer to the official IAM policy documentation.
Want to work on a team that's just as invested in how you work as what you're working on? Check out our open positions and apply.