> If you didn’t, here’s how the exploit works. The comparison operator, “!==”, is vulnerable to a timing attack. The string comparison compares the two keys one byte at a time, stopping as soon as it finds an offset where the two strings do not match. As a result, checkApiKey will take longer if the two keys start with the same bytes. It’s sort of like if the error message itself said “wrong key, but you got the first 2 bytes right!”.
There was a similar vulnerability but using space instead of time on TOPS-10 on the DEC PDP-10. That was back in the day when passwords were stored in plain text. When the operating system had to check a password it just did a string compare of the string supplied by the caller and the plain text string.
Crucially, it did not copy the caller's string into kernel memory before comparing. It compared the plain text in kernel memory with the caller string in user memory.
The trick then to password cracking was to start by guessing first characters. To test a first character you place it at some memory address where the next address is invalid, and you install a handler for access faults. Then check the password, telling the kernel your guess is longer than 1 character.
If you get an access fault you know that first character was right and the string comparison continued on into the invalid address. So try first characters until you get an access fault.
Once you have the first character, repeat but now starting the guess 2 characters before the invalid memory, with the right first character and you guess for the second character, and tell the kernel the password is at least 3 characters. Same for the rest of the characters.
(I don't remember if the kernel's string compare checked length first. If it did, add a first step above of determining the password length. Same idea...start near invalid memory, and guess lengths until you find the one that gives an access fault).
I've written and given a lot of references to attacks, countermeasures, fighting the compilers and even non-cryptographic side-channel attacks to uncover someone's Twitter identity:
Kyber Slash (and Kyber Slash II) is a recent one in a very modern, PQC finalist algorithm that is missing from the list if you're looking to expand! This was caused by an integer division by a known constant (KYBER_Q = 3329) under a sensitive numerator, with some optimizing compilers not emitting a regular idiv instruction.
Do any of these attacks matter for single-tenant computers where all network packets are sent on a hardware timer (say, 10 kHz) independent of crypto computing?
Doesn't that mitigate any side-channel timing attacks from the start?
The article cites JavaScript, which is not constant time. There's no sure way to do constant time operations in JavaScript and thus no secure way to do crypto directly in Javascript. Browsers like Firefox depend on low level calls which should be implemented in languages that are constant time capable.
JavaScript needs something like constant time WASM in order to do crypto securely, but seeing the only constant time WASM project on GitHub has only 16 stars and the last commit was 2 years ago, it doesn't appear to have much interest. https://github.com/WebAssembly/constant-time
However, for JavaScript, I recommend Paul's library Noble which is "hardened to be algorithmically constant time". It is by far the best library available for JavaScript. https://github.com/paulmillr/noble-secp256k1
There's two main issues with SubtleCrypto, a minor problem and a major problem. The minor problem is that it doesn't support much. 13 year old modern algorithms like Ed25519 and Ed25519ph are largely unsupported and browser support moves at glacial speed. Ed25519ph isn't supported by any, even though it's been adopted by NIST (see FIPS 186-5). This is where libraries like Paul's must be used. For browser support see https://caniuse.com/mdn-api_subtlecrypto_sign_ed25519
The major problem with SubtleCrypto is that it requires the original message to sign and verify. SubtleCrypto.sign(algorithm, key, data) and SubtleCrypto.verify(algorithm, key, signature, data)
This means that if signing a 2 GB document, Go can easily verify the signature using the 256 bit digest, while Javascript would require the whole 2 GB document to be loaded into SubtleCrypto.verify before verification and is an enormous waste of resources.
If it isn't clear from the above what the problem is, let me rephrase: SubtleCrypto's sign and verify always rehashes, so a developer cannot give it the digest. Instead, developers are forced to give it the whole document so it can rehashed by SubtleCrypto. This is a significant design blunder that others, like Go, avoided and will motivate developers to use less constant time safe libraries like Paul's.
>If you didn’t, here’s how the exploit works. The comparison operator, “!==”, is vulnerable to a timing attack. The string comparison compares the two keys one byte at a time, stopping as soon as it finds an offset where the two strings do not match. As a result, checkApiKey will take longer if the two keys start with the same bytes. It’s sort of like if the error message itself said “wrong key, but you got the first 2 bytes right!”.
I understand that it is technically a vulnerability, but you will not be able to measure these nanosecond fluctuations over network. hell, you wouldn't even be able to measure them with direct physical access to the machine. even if you sample every attempt a trillion times.
Thanks to thinking like that, many a system has been hacked. I can't think of any statistical methods that would recover this information, therefore they don't exist.
The fun thing about physical systems is that in spite of complexity in the stack, there are a lot of Gaussian processes in play which are very easy to clean up with some processing.
all I know is that it doesn't take even one microsecond to compare 32 bytes - it takes nanoseconds. and yes, I can't think of a way to meaningfully measure that delta over a real network, where every packet passes through dozens of switches and routers, each one of which introduces enough entropy to completely drown out any attempt to measure things with such precision.
but more importantly, what kind of system allows itself to be pounded with thousands or millions of requests to the authentication endpoint?
- a repeated event when aggregated (like summed or averaged) tends to look more and more like a normal distribution the more samples you got
- the standard deviation of a normal distribution scales roughly by the square root of the number of samples
- From microseconds to nanoseconds you need a 1000x in precision, 1000² = 1.000.000.
So from an event that has a noise amplitude of microseconds, I think you may be able to measure a difference of nanoseconds with ~1 million samples.
---
So yeah unlikely in normal scenarios. But if you could, for example, rent a server in 10km from your target, you can reduce by 10-100x the latency, which will reduce your number of samples by its square, so a 100x-10000x reduction
And what if it's possible to rent a server in the same datacenter of your target? ... well K.O.
What’s the exact algorithm you would write for that?
Let’s say you have a bool, $is_match then whenever two chars of the string don’t match you write false to it.
It takes slightly longer to write false than do nothing. You’re still leaking the total number of matching characters.
I think speculative execution on the CPU would amplify this effect. For example:
aaaaaaaaa != ccccccccc will be much faster than ababababa != aaaaaaaaa because in the first example the cpu will correctly guess and speculatively execute that the two characters are different. They were different last time. They’ll probably be different again.
// Throw an error if inputKey is not correct.
// inputKey and correctKey are both strings
function checkApiKey(inputKey, correctKey) {
var isMatch = true;
for (var i = 0; i < inputKey.length; i++)
if (inputKey.charAt(i) !== correctKey.charAt(i)) {
isMatch = false;
}
}
if (!isMatch) {
throw new Error("wrong key");
}
}
So what I meant is, don't return early when isMatch = false, instead walk the entire key regardless.
But if you have an incorrect key, you'll always execute the whole for loop & one write to `isMatch`.
I'm assuming the function would take the same amount of time to run regardless for which value of "i" the write occurs at. Wouldn't that then protect the attack vector of enumerating keys and checking the time?
As in, the only input that would cause the write to `isMatch` to be skipped would be the correct key, in which case you don't need to brute force the solution.
Branch prediction and speculative execution can affect that. For instance failing on the first byte could cause more iterations to be thrown away and re-executed than failing on the last byte.
In theory that would work but compilers and toolchains are far too clever now and always liable to change, so there's no way to know that in the future it won't be optimised back into shortcutting and revealing the timings.
I think that’s still affected by speculative execution. The CPU will process the step ahead assuming that a[i] != b[i] while it is doing the a[i] XOR b[i] check.
So it’ll be faster if: most characters match, most characters don’t match or there are runs of matching characters.
There are various methods for comparing two long strings in constant time.
For instance, you can compute a bitwise XOR of the two strings, then a bitwise OR of the words of the result.
Whether the final result is null or not shows whether the strings are equal or not, without providing any information about which are the differences, like in the standard scanning method.
This is the general method, all the operations required in the worst case must be executed always, without attempting to use clever tricks, which for certain particular inputs may skip some operations as unnecessary, thus providing information that those inputs have been encountered.
The kind of key comparison that has been given as an example of a mistake was already forbidden by Shannon's principle of confusion (published in a classified paper in 1945, declassified in 1949).
Many people claim to speak about Shannon's "diffusion and confusion", but they have not actually read Shannon's paper, so they guess wrong what Shannon has written. What Shannon has named "confusion" was that in any operation where a secret key or any other secret value is involved, any part of the result or any other observable property must depend equally of all the bits of the secret value (i.e. the bits of the secret value must be confused), so that the enemy is not able to extract any kind of information that applies to a part of the secret value, instead of to the whole secret value.
You assume that boolean operations are constant time, and whether that holds depends on the uarchitecture and how sophisticated the optimization layers are (e.g. nothing prevents the compiler or even the CPU from short-circuiting the OR as soon as the first XOR is non-zero).
Computing a MAC of the two input values under a once-off random key is actually much stronger.
As a matter of that, this highlights that the goal is to lower the SNR for the attacker, and constant time computation is only one of the two non-exclusive ways to achieve that, the other being adding sufficient noise to their measurements.
As it is written in the parent article, the verification of the code generated by the compiler is mandatory whenever a constant-time algorithm is written in a high-level language.
The short-circuiting could be prevented e.g. by storing the result in a volatile variable before testing if it is null, because the compiler cannot assume that the tested value is the same that has been written previously. Nevertheless, it is better to just check the generated assembly code and disable optimizations for that function or use inline assembly, if necessary.
The binary Boolean operations have been executed in constant-time in all electronic computers, since those made with vacuum tubes until now.
It is impossible to make them execute in variable time (because they are independent for all bit pairs and they must be executed for all of them), unless you do this on purpose, by inserting unnecessary delays that are dependent on the operands.
For no other operations implemented in hardware it is as certain that they are done in constant time. Even for word additions, it would be possible to make an implementation where the time to propagate the carry until the most significant bit would be variable, depending on the pattern of "1" bits in the operands, but no mainstream CPU has used such adders during the last half of century. Such adders have been used only in some early computers with vacuum tubes, which used serial adders instead of the parallel adders used in all modern CPUs.
Your argument boils down to "all hardware implementations so far in history never optimized word boolean operations so future implementations will keep doing so".
I think that is just an assumption and I would not take that risk for high security applications, unless for specific CPU models where the behavior is measured in practice (certainly not by just auditing the assembly, which is by itself already too high-level). After all, never in history we had such a level of sophistication.
Right now, all zero register values can lead to speed gains, so they could be obversable. But both ARM and Intel latest ISA have introduced flags that permit the future CPUs to perform operations with data-dependent timing. Boolean operations are officially marked as potentially affected by that flag.
No, my argument is that it is impossible to optimize the binary Boolean operations in a way in which they will have variable execution time.
Moreover, they are the only commonly encountered operations that are implemented in hardware and for which this is unconditionally true.
Therefore they are the first choice for the implementation of any constant-time algorithm.
Most modern CPUs have many other operations that are executed in constant time, like additions and subtractions, but for those, variable-time implementations are also possible.
As I have already said, for any bitwise binary boolean operation, the elementary operations having a pair of bits as operands are independent, so there exists no way of performing the complete operation without doing all the sub-operations, unlike for other simple operations, like additions, shifts or rotations, where it is possible to detect conditions that allow an earlier completion of the operation.
The hardware implementation can do the sub-operations sequentially or concurrently, in any order, but it always must execute all of them, regardless of the values of the input operands, so they will take the same time.
Only in an asynchronous CPU and only for certain kinds of logic gate implementations it is possible to have a completion time for a binary Boolean operation that varies with the values of the input bits, but for the most common asynchronous circuit synthesis methods, which use dual rail encoding for bits, i.e. each bit is encoded as a pair of complementary bits, the binary boolean operations remain constant-time, like in the normal synchronous CPUs.
Let's assume that you have a simple XOR between two registers.
If the CPU can pre-label a register as having no bits set (and they can or speculate on it), during scheduling, it could theoretically simply drop the XOR, transfer or rename the relevant register which may lead to a tiny but different timing that can be measured and exploited.
That is just one simple counter-example that shows how the assumptions you present are not necessarily valid on modern complex CPUs. Many more are examples are possible, which is of course not to say that they are implemented today. But they could be implemented in the future (without us knowing, and again, both ARM and Intel are explicit on that), therefore the security of a security-sensitive piece of code should not solely rely on that.
Yeah, I (but I am not not OP) agree that hashing is not absolutely perfect, but it's still much much better than leaking the actual password, and it would be cool to at least compare it as a feasible solution.
Getting the first character of the hash is trivial, but as you get further and further out you need to do more and more computational work locally to guess a string which will hash to the prefix you already know (given that your hash is secure against Preimage attack). And then at the end you end up with the complete hash, which itself is extremely hard to get anything useful out of. (all of this is given that you use a hash that is resistant to Preimage attack, and preferably salting as well).
> What's the difference between a plaintext key and a hash in this context?
> The attacker can crack the first byte of the key by trying all 256 possibilities, and observing which one caused the comparison to take longer.
OK as a thought experiment, the algorithm is passed plaintext data A (the input that the hacker is trying to guess) and hashed data B.
It hashes A before starting a comparison and then does a naive check, an we have code like-
function checkApiKey(inputKey, correctHashedKey) {
inputHashedKey=sha256(inputKey)
if (inputHashedKey !== correctHashedKey) {
throw new Error("wrong key");
}
}
If we're using a decently secure hash algorithm, for every bit that changes in the input, we will create (indistinguishable from) random changes throughout the output. An attacker on this system can't just continually tweak the first byte of input to figure out when a longer comparison takes place, because there's no guarantee that doing that will ever result in the first byte of the hash matching the first byte of the stored hash. And in theory that doesn't actually give away any information about the full password, because if you've matched byte one, adding the next byte to your guess changes the whole hash in unpredictable ways.
Constant time still removes this sort of avenue entirely, but this would seem to confound the attack (or at least make it much harder problem to solve).
Happy to be educated otherwise if someone that's actually an expert would like to chime in?
Basically you can figure out the hash using the aforementioned timing attack.
You can now perform an offline attack against the password that produced the hash.
It’s like you hacked into the database and got the password hashes.
Then, you fall back to something like hashcat.
If with a bit of luck you just hashed your password with a single round of SHA and did not use something cool like Argon2id then you’re toast. If your password is simple you’re toast too.
On the other hand, if you hashed a totally random key, the safety will rely on whether the hashing algorithm itself is constant time or not. You just moved the needle.
> Basically you can figure out the hash using the aforementioned timing attack.
All popular hashing procedures (sha, argon, etc) to my knowledge produce high difference output for small difference inputs. This means that even with one round of sha128[0] your comparison timings turn very erratic. I won't go as far as saying that they turn random, because somebody will jump on it, but what they do achieve is render timing attacks at least impractical for the next few decades.
You could use a non-constant salt for the hash, for instance use a sufficiently random number or start with one and increment it, so the comparison is hash(input+salt)==hash(secret+salt), that way each attempt gets the attacker at best a part of a different hash.
This is also assuming the secret is know in plain text – for properly stored passwords you now have hash2(hash1(input+salt1)+salt2)==hash2(prehashed-secret+salt2) with salt2 changing on each attempt.
Of course this is a more computationally expensive than a more direct time-invariant string compare such as bitwise-xor-secret-with-guess-then-bitwise-or-all-bytes-in-result-and-compare-that-to-0, so messing around with hashes & salts is probably not what you would do. The XOR-the-OR method isn't perfect, it will still at least leak the length of the secret if it is unhashed (hashing just before the comparison won't save this because the hashing itself is not time invariant).
If you check hash(a) == hash(b), the attacker cannot produce a valid hash, because cryptographic hash functions are one way. This assumes you use a secure hash function, like not md5.
Even if they discover hash(b), they can't produce b.
What people always forget when they make statements about hashing being one way is that these algorithms are designed to be incredibly fast.
So if what you’re hashing has low entropy/randomness (say a password) then you can totally brute force it.
In fact if there’s not a salt you can look it up in a rainbow table, saving yourself some compute. Googling a hash also works sometimes. If you want to brute force look at hashcat.
Something like SHA256 protects about 128 bits of entropy. If you go much lower than that, you’ll be brute forced. You can look at hashcat benchmarks online to estimate the time and money necessary to perform the attack. Anyone can crack an average simple password using a single round of SHA256 in no time.
For passwords you often intentionally use an algorithm that is known to be somewhat slow. You want to rate limit passwords anyway and generally you should not be getting enough logins that an inefficient algorithm matters (if it does you apply denial of service protection not fix your password algorithm).
The problem is that as an attacker sending requests to the server, what you control is the API key, not the hash of the API key. To test a specific hash like "c000...0000" you would need to find an input to the hash function that will yield this output. Cryptographic hash functions have a property called preimage resistance that means this is prohibitively difficult (see e.g. https://crypto.stackexchange.com/a/1174).
But you're not passing the hash to the service. You're passing the key. The hashing is done on a system not controlled by you, so you need to guess a plaintext that has the prefix you need.
This is not an easy problem, and in fact is what is used as proof-of-work for many cryptocurrencies.
The article briefly mentions the possibility to access secrets on a hardware level by monitoring the ressources of other threads. I have no idea if this is only possible in theory but completely impractical becauste it takes so much effort. Have there been attacks like that in the past?
What Situation would a computation like in that article actually happen in?
All key checking procedures I have ever encountered either depended on hash comparisson or asymetric crypto (or a combination of them) both of which are mostly immune to pragmati timing attacks as far as I know.
There's a good chunk of the article about that. "Constant-tile" is a bit of a misnomer, "secret-independent resource consumption" is a better way to say it. If you just add a "sleep(rand)" or equivalent then you can still break the secret using statistics or sidechannel attacks.
Throttling can still result in an effective DoS for the affected user, as they get stuck in the queue behind the brute force attempts. Throttling based on source address is not practical either given many brute force attempts use many hacked hosts as their sources to get around this very sort of limit.
There was a similar vulnerability but using space instead of time on TOPS-10 on the DEC PDP-10. That was back in the day when passwords were stored in plain text. When the operating system had to check a password it just did a string compare of the string supplied by the caller and the plain text string.
Crucially, it did not copy the caller's string into kernel memory before comparing. It compared the plain text in kernel memory with the caller string in user memory.
The trick then to password cracking was to start by guessing first characters. To test a first character you place it at some memory address where the next address is invalid, and you install a handler for access faults. Then check the password, telling the kernel your guess is longer than 1 character.
If you get an access fault you know that first character was right and the string comparison continued on into the invalid address. So try first characters until you get an access fault.
Once you have the first character, repeat but now starting the guess 2 characters before the invalid memory, with the right first character and you guess for the second character, and tell the kernel the password is at least 3 characters. Same for the rest of the characters.
(I don't remember if the kernel's string compare checked length first. If it did, add a first step above of determining the password length. Same idea...start near invalid memory, and guess lengths until you find the one that gives an access fault).