Thanks for this great post. I actually just signed up to say that I actually wasted a considerable amount of time trying to create my own SIGv4 just to avoid replay attacks especially if the server is behind some L-7 load balancer such as Cloudflare more than any other reason. It was implemented like this: the server creates a ecdsa/ed22519 key pair, hands over the private key to the user and keeps only the public key, then for every request to the actual APIs, the user uses the private key to create a JWT that contains some attributes such as the userID, sessionID, the request path, etc...). While this method obviously looked much more secure than just a long-term static JWT or an access token, the real problem I found was in the clock skew, if the client doesn't have a well synchronized clock, then the JWT is deemed invalid by the server since such this JWT should be very short-lived by design otherwise what's the point of having them in the first place. If your clients are browsers or running on PCs, as opposed to servers, containers, etc... this will be a huge problem for you. In fact, AWS clients implement their own clock skewing correction https://aws.amazon.com/blogs/developer/clock-skew-correction...
I ended up implementing an additional auth API endpoint where you use your long-lived secret to periodically request a short-lived JWT that is only valid for a few hours, this short-lived secret is what is used for the actual APIs to authenticate the users.
Wow, never heard of HTTP Message Signatures until now. However, like I said, Clock skew is a huge problem in this method. I found out about it much later after implementation when I used the client on some old Windows machine whose clock wasn't synchronized and I spent like a day until I figured out it was the clock skew which made this whole method look suspicious to me even it's obviously much more secure. It could be great in a microservice environment where you're pretty sure that all your endpoints have well synced clock, but when your clients are browsers and PCs, it is about time until you experience it.
As for the second approach, yes exactly it's very similar to OAuth2. In fact I guess this is the method used by GCP APIs by service accounts. The clients use their long-lived secrets to get an oauth2 access token, this access token is JWT I guess that also contains authorization information such as scopes.
I was pretty sure that you can approximate this in aws iam just with policy conditions on “aws:CurrentTime”
But after much googling it appears not! Seems like a fairly simple additional variables aws could pass through and condition to support, so you can say “only allow this request if the signing time is < 1min ago”, closest you can get is that the policy is only valid after X date! Is a shame!
This is neat, although it throws a wrench into my recent move encrypt my AWS secret access key with my TPM to prevent the key from being used anywhere besides my laptop[0].
We actually just did a livestream today at AWS about how to store a credential in an HSM to avoid having IAM credentials in clear text. You can see it here - https://www.twitch.tv/aws/video/1156973272
We haven’t released the code yet but are in the process. If you think this could work for you or you’d just like to see how we did it DM me on Twitter @timmattison and I’ll give you the code ASAP.
3 years ago, I implemented a library to do this for Python/boto3: https://github.com/pyauth/exile - because of assumptions that the sigv4 signers make in botocore, it actually has to do a bit of monkeypatching to get them to work. I tried to get the change merged upstream to make the botocore signers more flexible (https://github.com/boto/botocore/issues/1689) but my PR is still open.
Wow, I tried this (or another similar project) a few years back and loved it. Nice work! Sorry it is still pending though. I don’t work on the SDKs unfortunately.
On the auth side, the major change since then is that you can use the IoT credentials provider to provide certificate based auth to all services (https://docs.aws.amazon.com/iot/latest/developerguide/author...). You don’t need to be using any of the other IoT services. It was created to make it easier for devices to use AWS services but can be used by anyone/anything.
What we did was combine the AWS CLI feature to source credentials from an external process (https://docs.aws.amazon.com/cli/latest/userguide/cli-configu...) with a script to do the certificate based auth. This allows you to obtain STS credentials using a certificate and pass them to the CLI (access key, secret key, session token). Your secure hardware just needs to do the normal work of assisting in the mutual TLS auth which in our case was done with curl and Zymbit’s OpenSSL engine. We are releasing that code along with a SoftHSM2 setup so people can see how it works in a test environment.
In this case it was a Zymbit Zymkey 4i which contains a secure element with a bunch of additional functionality (tamper detection, etc) and works with a Raspberry Pi. Now I’m wondering how easily it could be adapted for use on a laptop with a TPM…
If you are looking for some references besides my linked code, this comment[0] on the tpm2-tools repo will probably be useful. FWIW, I've moved my workflow over to having long lived aws keys protected by my TPM and then I generate session credentials from that for normal aws cli usage.
Feels odd that AWS still uses secret tokens all these years later. Take GCP for example. They use your personal identity (via OAuth, service accounts use certs), and you can use FIDO tokens to authenticate your local keys (e.g. for gcloud CLI). I was very surprised to see that awscli still doesn't support Yubikeys.
I'm a big fan of aws-vault [1], which helps securely store your tokens and use them to obtain temporary credentials which are time-limited and constrained to a specific IAM role.
It's not as good as having something that supports a hardware token, of course, but it's better than the default awscli suggestion to keep the secrets around in plaintext either on disk or in env vars.
aws-vault is one of the standard answers for this problem, and once you have it set up, the ergonomics are in some ways superior to that of manually managed AWS secrets. Highly recommended.
But only for the web console, not via the CLI. The API for generating short-term keys (which aws-vault uses) only supports TOTP, not FIDO/U2F/WebAuthn.
Or, use saml2aws with your system keychain (pass/gpg for headless systems) and don't use aws-vault at all. With the credential_process option to AWS CLI, you don't even ever need to re-authenticate, as the AWS CLI will call saml2aws, which will use your keychain-stored SSO credentials, and automatically refresh your AWS temporary session.
The article is vague (no link to docs, no formula, no block diagram), but it seems to say the fundamental process still runs via a pre-shared-secret. That is secret you're trying to secure, and one that AWS knows.
However, that secret is no longer used for the signing. Instead it's combined with a date and service-description to generate a public-private keypair, which is itself not secret (both AWS and you can derive this if you know the algorithm). Your requests are now signed with this private key (client side) and can be verified by the services with the public key (server side).
AWS still knows your secret, but it keeps it in a more secured location verses out on the farms. The farms would need a way to request your public key for the day from the secured location, so they can verify the sender. But, the farms no longer have access to your secret key, so cannot sign other requests masquerading as you.
If the ECDSA key pair is still derived using an hmac keyed by the private key, I can do that with the key loaded in the tpm. Otherwise it is unlikely to work with my current setup.
Once the key is loaded into the TPM as an HMAC key, you can no longer get the plaintext key back out. You have to use the TPM to perform any HMAC operations using the key.
Not quite. aws-vault still reads the secret out of the backend to create session credentials. You likewise could read the backend directly to get the credentials (assuming you are authenticated).
With the secret stored in a tpm it cannot be extracted. Instead you ask the tpm execute the hmac() function on your behalf with the secret only it can read.
Is there a benefit to companies all using the exact same web service request signature scheme?
One downside is that it would make widespread web service API abuse even easier than it already is. One would need to understand even less about the mechanics and workings of same-scheme services across companies. Yes, this is security through obscurity, but why leave more surface area open than necessary?
> Is there a benefit to companies all using the exact same web service request signature scheme?
Well, there's some things to consider.
First, Amazon AWS is a very rich target. It hosts a very large quantity of services that would be ripe for exploitation. I'm sure attackers would love to find holes in the AWS mechanism.
Second, because of number 1, I'm sure the AWS services are not only a prime target for attack, but also actively being attacked. By both nefarious black hat ne'er do wells, to state level agencies. I would fully expect US, Russian, Chinese and other state intelligence agencies to be very interested in an exploit of something as ripe as the AWS system.
Third, Amazon has the motivation, due to number 1, to keep on top of and ensure that its technique is sound and robust. Not only that, it has the resources to do it.
Four, the technique is open, and documented, and available to all. No skullduggery is required at an algorithm level to analyze it and understand it. Thesis seeking white hat PhD students can have at it and advance the field.
So, if there were to be "one algorithm", Amazon and AWS have the experience and know how to hold theirs up high on a pedestal labeled "You could do a lot worse".
I wonder why the subkeys are valid for 1d and not something much shorter like 15 minutes. IIUC if the frontend IAM servers of a service get compromised you can only accept traffic for that service again on the next day, when a new subkey is valid.
I also wonder whether those subkeys are copied over to the frontend server (that might be a lot of data to copy) every day or requested and cached (possibly high latency for first call)
I took it as a way to handle clock skew. Like many others, I've had lots of problems with clients having a clock that is pretty inaccurate. Though I think I agree with you that I don't think I've ever had a client with a skew of 1d.
You could choose a grace period independently. Downside is that you have to keep all currently accepted keys around so there is a limit somewhere on storage costs.
> a VP and Distinguished Engineer at Amazon Web Services
Any further translation on what VP and "Distinguished Engineer" means here? It sounds like both of them are honorary and doesn't really mean much, but then "Distinguished" makes it seem they received some sort of Nobal price of computing as well.
And if neither of them really mean anything, then this guy is really "just" a developer right?
In the org I work in "Distinguished" is simply a level of seniority (and pay grade). For engineers we have; grad -> engineer -> senior -> specialist -> principal -> chief -> distinguished.
The further/higher you get you usually do less "programming" and more design/architectural, plus line management.
Distinguished Engineer is the top “level” for engineers at Amazon. It’s equivalent to a VP position for managers. So it’s a job title, similar to Sr Engineer.
Colm is an extremely smart engineer who has designed some really big systems. Distinguished is the title they give you when they want to pay you VP bucks so you stay at Amazon.
In my non-Amazon experience, VP (Vice President [of something]) was a position in corporate hierarchy; Distinguished Engineering a technical-track recognition of long-term achievement, more an accolade like you say (but probably with some compensation-based recognition I imagine).
Colm is a fairly badass engineer. He was a senior principle when I left. Sounds like they made another level since then and he jumped into VP. Somewhat makes sense as he leads the business in both a technical and business sense. Amazon’s traditional issue is they only know how to level managers. Turns out when the company starts accruing some really senior people you have to accommodate them.
This is a fairly fluffy piece of writing. His internal writings were much deeper.
I specifically put quotes around it to avoid comments like you. "Just" in terms of superfluous titles, not that becoming a developer is easy or anything. Seems I've managed to get my answers though, so thank you everyone (except argc)!
I’m sure he has exceeded this by now. If you can afford to keep genius around then you should.
It’s hard to get familiar with engineers at Amazon. They have spokes persons who release things written by others and their external contribution process is difficult at best.
It isn’t a surprise people think he is a nobody. I still remember him after five years.
Really? Looks like someone's blog on technical things that they're interested in and there's an 'About' link right where you'd expect that tell you who they are.
I flat out don't trust AWS SIG; It falls into the classic encryption antipattern of signing the meaning and not the bytes - Anytime a transformation of the data is required before or after the signature is applied, you open up a hole for attackers to exploit. See V1 and V3, which were flat out insecure and have been abandoned.
You often can't sign the meaning, because the systems handling the request can reorder the bytes, and there are unbounded number of byte sequences mapping to a given meaning. That's the whole reason we have canonicalization.
Canonicalization is treacherous and generalist developers shouldn't freelance their own formats. Canonicalization problems are one of the reasons we're at SIGV4 and not just, like, SIG. But Amazon has at this point thought about this as carefully as any organization in the Internet.
Which is why it's good advice to, if you're going for a signed-request API, just shoplift SIGV4 from them.
Could you explain (even just a link if you have one) what you and GP mean by 'bytes vs. meaning'?
I can't really imagine what the latter is, other than perhaps signing byte values within a structure rather than the structure as a whole, and if that's it I don't understand how it's a non trivial difference, or what you mean by 'byte sequences mapping to a given meaning'.
The parent is noting that what actually gets signed in SIGV4 isn't the raw request, but a pseudo-request derived from the raw; the pseudo is the "meaning" (in their terminology); a cryptography engineer would call it the "canonicalized" request.
What they'd like instead is a signature format that simply signed the raw request, the "bytes".
I ended up implementing an additional auth API endpoint where you use your long-lived secret to periodically request a short-lived JWT that is only valid for a few hours, this short-lived secret is what is used for the actual APIs to authenticate the users.