A beginner's guide to constant-time cryptography (2017)

tzs · on Feb 22, 2024

> If you didn’t, here’s how the exploit works. The comparison operator, “!==”, is vulnerable to a timing attack. The string comparison compares the two keys one byte at a time, stopping as soon as it finds an offset where the two strings do not match. As a result, checkApiKey will take longer if the two keys start with the same bytes. It’s sort of like if the error message itself said “wrong key, but you got the first 2 bytes right!”.

There was a similar vulnerability but using space instead of time on TOPS-10 on the DEC PDP-10. That was back in the day when passwords were stored in plain text. When the operating system had to check a password it just did a string compare of the string supplied by the caller and the plain text string.

Crucially, it did not copy the caller's string into kernel memory before comparing. It compared the plain text in kernel memory with the caller string in user memory.

The trick then to password cracking was to start by guessing first characters. To test a first character you place it at some memory address where the next address is invalid, and you install a handler for access faults. Then check the password, telling the kernel your guess is longer than 1 character.

If you get an access fault you know that first character was right and the string comparison continued on into the invalid address. So try first characters until you get an access fault.

Once you have the first character, repeat but now starting the guess 2 characters before the invalid memory, with the right first character and you guess for the second character, and tell the kernel the password is at least 3 characters. Same for the rest of the characters.

(I don't remember if the kernel's string compare checked length first. If it did, add a first step above of determining the password length. Same idea...start near invalid memory, and guess lengths until you find the one that gives an access fault).

mratsim · on Feb 22, 2024

I've written and given a lot of references to attacks, countermeasures, fighting the compilers and even non-cryptographic side-channel attacks to uncover someone's Twitter identity:

- https://github.com/mratsim/constantine/wiki/Constant-time-ar...

cipherboy · on Feb 22, 2024

Kyber Slash (and Kyber Slash II) is a recent one in a very modern, PQC finalist algorithm that is missing from the list if you're looking to expand! This was caused by an integer division by a known constant (KYBER_Q = 3329) under a sensitive numerator, with some optimizing compilers not emitting a regular idiv instruction.

https://kyberslash.cr.yp.to/faq.html

mratsim · on Feb 22, 2024

idiv is not constant-time on any ISA anyway ;)

makira · on Feb 22, 2024

Do any of these attacks matter for single-tenant computers where all network packets are sent on a hardware timer (say, 10 kHz) independent of crypto computing? Doesn't that mitigate any side-channel timing attacks from the start?

Zamicol · on Feb 22, 2024

I noticed in July of 2022 that Go did exactly the vulnerable example and reported it to the security team.

https://github.com/golang/go/issues/53849

It was fixed as of Go 1.21 https://go.dev/doc/go1.21

---

The article cites JavaScript, which is not constant time. There's no sure way to do constant time operations in JavaScript and thus no secure way to do crypto directly in Javascript. Browsers like Firefox depend on low level calls which should be implemented in languages that are constant time capable.

JavaScript needs something like constant time WASM in order to do crypto securely, but seeing the only constant time WASM project on GitHub has only 16 stars and the last commit was 2 years ago, it doesn't appear to have much interest. https://github.com/WebAssembly/constant-time

However, for JavaScript, I recommend Paul's library Noble which is "hardened to be algorithmically constant time". It is by far the best library available for JavaScript. https://github.com/paulmillr/noble-secp256k1

gmac · on Feb 22, 2024

In JS you’d use SubtleCrypto, (as long as it implements what you need), right?

Zamicol · on Feb 23, 2024

Yes, SubtleCrypto is preferred when usable.

There's two main issues with SubtleCrypto, a minor problem and a major problem. The minor problem is that it doesn't support much. 13 year old modern algorithms like Ed25519 and Ed25519ph are largely unsupported and browser support moves at glacial speed. Ed25519ph isn't supported by any, even though it's been adopted by NIST (see FIPS 186-5). This is where libraries like Paul's must be used. For browser support see https://caniuse.com/mdn-api_subtlecrypto_sign_ed25519

The major problem with SubtleCrypto is that it requires the original message to sign and verify. SubtleCrypto.sign(algorithm, key, data) and SubtleCrypto.verify(algorithm, key, signature, data)

This is contrasted to Go's ECDSA package which expects a digest to sign and verify. See https://pkg.go.dev/crypto/ecdsa#PrivateKey.Sign

This means that if signing a 2 GB document, Go can easily verify the signature using the 256 bit digest, while Javascript would require the whole 2 GB document to be loaded into SubtleCrypto.verify before verification and is an enormous waste of resources.

If it isn't clear from the above what the problem is, let me rephrase: SubtleCrypto's sign and verify always rehashes, so a developer cannot give it the digest. Instead, developers are forced to give it the whole document so it can rehashed by SubtleCrypto. This is a significant design blunder that others, like Go, avoided and will motivate developers to use less constant time safe libraries like Paul's.

gmac · on Feb 23, 2024

Thanks, that’s a helpful summary. I knew about the ed25519 issue (but not the others) from writing this: https://github.com/jawj/subtls

123yawaworht456 · on Feb 22, 2024

>If you didn’t, here’s how the exploit works. The comparison operator, “!==”, is vulnerable to a timing attack. The string comparison compares the two keys one byte at a time, stopping as soon as it finds an offset where the two strings do not match. As a result, checkApiKey will take longer if the two keys start with the same bytes. It’s sort of like if the error message itself said “wrong key, but you got the first 2 bytes right!”.

I understand that it is technically a vulnerability, but you will not be able to measure these nanosecond fluctuations over network. hell, you wouldn't even be able to measure them with direct physical access to the machine. even if you sample every attempt a trillion times.

chowells · on Feb 22, 2024

And yet, it's been demonstrated to work in practice. How do you square this apparent contradiction?

mecsred · on Feb 22, 2024

Thanks to thinking like that, many a system has been hacked. I can't think of any statistical methods that would recover this information, therefore they don't exist.

foobarian · on Feb 22, 2024

The fun thing about physical systems is that in spite of complexity in the stack, there are a lot of Gaussian processes in play which are very easy to clean up with some processing.

123yawaworht456 · on Feb 22, 2024

all I know is that it doesn't take even one microsecond to compare 32 bytes - it takes nanoseconds. and yes, I can't think of a way to meaningfully measure that delta over a real network, where every packet passes through dozens of switches and routers, each one of which introduces enough entropy to completely drown out any attempt to measure things with such precision.

but more importantly, what kind of system allows itself to be pounded with thousands or millions of requests to the authentication endpoint?

madacol · on Feb 23, 2024

- a repeated event when aggregated (like summed or averaged) tends to look more and more like a normal distribution the more samples you got

- the standard deviation of a normal distribution scales roughly by the square root of the number of samples

- From microseconds to nanoseconds you need a 1000x in precision, 1000² = 1.000.000.

So from an event that has a noise amplitude of microseconds, I think you may be able to measure a difference of nanoseconds with ~1 million samples.

---

So yeah unlikely in normal scenarios. But if you could, for example, rent a server in 10km from your target, you can reduce by 10-100x the latency, which will reduce your number of samples by its square, so a 100x-10000x reduction

And what if it's possible to rent a server in the same datacenter of your target? ... well K.O.

another-dave · on Feb 22, 2024

Could you just iterate through the entire key even if you know it's not a match after 1 byte?

Then it would always take the same amount of time to return `true/false`, even if you knew this value earlier in the loop.

sjducb · on Feb 22, 2024

What’s the exact algorithm you would write for that?

Let’s say you have a bool, $is_match then whenever two chars of the string don’t match you write false to it.

It takes slightly longer to write false than do nothing. You’re still leaking the total number of matching characters.

I think speculative execution on the CPU would amplify this effect. For example:

aaaaaaaaa != ccccccccc will be much faster than ababababa != aaaaaaaaa because in the first example the cpu will correctly guess and speculatively execute that the two characters are different. They were different last time. They’ll probably be different again.

another-dave · on Feb 22, 2024

As a naive approach something like this:

    // Throw an error if inputKey is not correct.
    // inputKey and correctKey are both strings
    function checkApiKey(inputKey, correctKey) {
      var isMatch = true;
      for (var i = 0; i < inputKey.length; i++)
        if (inputKey.charAt(i) !== correctKey.charAt(i)) {
          isMatch = false;
        }
      }
      if (!isMatch) {
        throw new Error("wrong key");
      }
    }

So what I meant is, don't return early when isMatch = false, instead walk the entire key regardless.

madacol · on Feb 23, 2024

> isMatch = false;

That instruction is executed conditionally, you can time it.

Maybe eliminate the if completely an do something like this

    ...
      for (var i = 0; i < inputKey.length; i++)
        isMatch = isMatch && (inputKey.charAt(i) !== correctKey.charAt(i)
      }
    ...

another-dave · on Feb 26, 2024

But if you have an incorrect key, you'll always execute the whole for loop & one write to `isMatch`.

I'm assuming the function would take the same amount of time to run regardless for which value of "i" the write occurs at. Wouldn't that then protect the attack vector of enumerating keys and checking the time?

As in, the only input that would cause the write to `isMatch` to be skipped would be the correct key, in which case you don't need to brute force the solution.

ramchip · on Feb 27, 2024

Branch prediction and speculative execution can affect that. For instance failing on the first byte could cause more iterations to be thrown away and re-executed than failing on the last byte.

another-dave · on Feb 28, 2024

ah cool I understand what you mean now. Thanks for the explanation!

continuational · on Feb 22, 2024

Quick take, assuming equal length of a and b:

For each byte index i:

x[i] = a[i] XOR b[i]

return sum(x) == 0

faceplanted · on Feb 22, 2024

In theory that would work but compilers and toolchains are far too clever now and always liable to change, so there's no way to know that in the future it won't be optimised back into shortcutting and revealing the timings.

sjducb · on Feb 22, 2024

I think that’s still affected by speculative execution. The CPU will process the step ahead assuming that a[i] != b[i] while it is doing the a[i] XOR b[i] check.

So it’ll be faster if: most characters match, most characters don’t match or there are runs of matching characters.

Have a read of this: https://stackoverflow.com/questions/11227809/why-is-processi...

Ar-Curunir · on Feb 22, 2024

There is no branch here, so nothing for the CPU to speculate on.

bluGill · on Feb 22, 2024

There is no obvious branch, but the compiler is allowed to implement the as-if rule and insert a branch if it figures out what you are doing.

bananaflag · on Feb 22, 2024

Can't you just hash the keys before comparing them?

adrian_b · on Feb 22, 2024

Hashing is not a good solution.

There are various methods for comparing two long strings in constant time.

For instance, you can compute a bitwise XOR of the two strings, then a bitwise OR of the words of the result.

Whether the final result is null or not shows whether the strings are equal or not, without providing any information about which are the differences, like in the standard scanning method.

This is the general method, all the operations required in the worst case must be executed always, without attempting to use clever tricks, which for certain particular inputs may skip some operations as unnecessary, thus providing information that those inputs have been encountered.

The kind of key comparison that has been given as an example of a mistake was already forbidden by Shannon's principle of confusion (published in a classified paper in 1945, declassified in 1949).

Many people claim to speak about Shannon's "diffusion and confusion", but they have not actually read Shannon's paper, so they guess wrong what Shannon has written. What Shannon has named "confusion" was that in any operation where a secret key or any other secret value is involved, any part of the result or any other observable property must depend equally of all the bits of the secret value (i.e. the bits of the secret value must be confused), so that the enemy is not able to extract any kind of information that applies to a part of the secret value, instead of to the whole secret value.

zimmerfrei · on Feb 22, 2024

You assume that boolean operations are constant time, and whether that holds depends on the uarchitecture and how sophisticated the optimization layers are (e.g. nothing prevents the compiler or even the CPU from short-circuiting the OR as soon as the first XOR is non-zero).

Computing a MAC of the two input values under a once-off random key is actually much stronger.

As a matter of that, this highlights that the goal is to lower the SNR for the attacker, and constant time computation is only one of the two non-exclusive ways to achieve that, the other being adding sufficient noise to their measurements.

adrian_b · on Feb 22, 2024

As it is written in the parent article, the verification of the code generated by the compiler is mandatory whenever a constant-time algorithm is written in a high-level language.

The short-circuiting could be prevented e.g. by storing the result in a volatile variable before testing if it is null, because the compiler cannot assume that the tested value is the same that has been written previously. Nevertheless, it is better to just check the generated assembly code and disable optimizations for that function or use inline assembly, if necessary.

The binary Boolean operations have been executed in constant-time in all electronic computers, since those made with vacuum tubes until now.

It is impossible to make them execute in variable time (because they are independent for all bit pairs and they must be executed for all of them), unless you do this on purpose, by inserting unnecessary delays that are dependent on the operands.

For no other operations implemented in hardware it is as certain that they are done in constant time. Even for word additions, it would be possible to make an implementation where the time to propagate the carry until the most significant bit would be variable, depending on the pattern of "1" bits in the operands, but no mainstream CPU has used such adders during the last half of century. Such adders have been used only in some early computers with vacuum tubes, which used serial adders instead of the parallel adders used in all modern CPUs.

zimmerfrei · on Feb 22, 2024

Your argument boils down to "all hardware implementations so far in history never optimized word boolean operations so future implementations will keep doing so".

I think that is just an assumption and I would not take that risk for high security applications, unless for specific CPU models where the behavior is measured in practice (certainly not by just auditing the assembly, which is by itself already too high-level). After all, never in history we had such a level of sophistication.

Right now, all zero register values can lead to speed gains, so they could be obversable. But both ARM and Intel latest ISA have introduced flags that permit the future CPUs to perform operations with data-dependent timing. Boolean operations are officially marked as potentially affected by that flag.

adrian_b · on Feb 22, 2024

No, my argument is that it is impossible to optimize the binary Boolean operations in a way in which they will have variable execution time.

Moreover, they are the only commonly encountered operations that are implemented in hardware and for which this is unconditionally true.

Therefore they are the first choice for the implementation of any constant-time algorithm.

Most modern CPUs have many other operations that are executed in constant time, like additions and subtractions, but for those, variable-time implementations are also possible.

As I have already said, for any bitwise binary boolean operation, the elementary operations having a pair of bits as operands are independent, so there exists no way of performing the complete operation without doing all the sub-operations, unlike for other simple operations, like additions, shifts or rotations, where it is possible to detect conditions that allow an earlier completion of the operation.

The hardware implementation can do the sub-operations sequentially or concurrently, in any order, but it always must execute all of them, regardless of the values of the input operands, so they will take the same time.

Only in an asynchronous CPU and only for certain kinds of logic gate implementations it is possible to have a completion time for a binary Boolean operation that varies with the values of the input bits, but for the most common asynchronous circuit synthesis methods, which use dual rail encoding for bits, i.e. each bit is encoded as a pair of complementary bits, the binary boolean operations remain constant-time, like in the normal synchronous CPUs.

zimmerfrei · on Feb 22, 2024

Let's assume that you have a simple XOR between two registers.

If the CPU can pre-label a register as having no bits set (and they can or speculate on it), during scheduling, it could theoretically simply drop the XOR, transfer or rename the relevant register which may lead to a tiny but different timing that can be measured and exploited.

That is just one simple counter-example that shows how the assumptions you present are not necessarily valid on modern complex CPUs. Many more are examples are possible, which is of course not to say that they are implemented today. But they could be implemented in the future (without us knowing, and again, both ARM and Intel are explicit on that), therefore the security of a security-sensitive piece of code should not solely rely on that.

histories · on Feb 22, 2024

What's the difference between a plaintext key and a hash in this context?

>The attacker can crack the first byte of the key by trying all 256 possibilities, and observing which one caused the comparison to take longer.

It means that at the end you get the hash and not the key, but it is still a vulnerability

Epa095 · on Feb 22, 2024

Yeah, I (but I am not not OP) agree that hashing is not absolutely perfect, but it's still much much better than leaking the actual password, and it would be cool to at least compare it as a feasible solution.

Getting the first character of the hash is trivial, but as you get further and further out you need to do more and more computational work locally to guess a string which will hash to the prefix you already know (given that your hash is secure against Preimage attack). And then at the end you end up with the complete hash, which itself is extremely hard to get anything useful out of. (all of this is given that you use a hash that is resistant to Preimage attack, and preferably salting as well).

Nursie · on Feb 22, 2024

> What's the difference between a plaintext key and a hash in this context?

> The attacker can crack the first byte of the key by trying all 256 possibilities, and observing which one caused the comparison to take longer.

OK as a thought experiment, the algorithm is passed plaintext data A (the input that the hacker is trying to guess) and hashed data B.

It hashes A before starting a comparison and then does a naive check, an we have code like-

  function checkApiKey(inputKey, correctHashedKey) {
    inputHashedKey=sha256(inputKey)
    if (inputHashedKey !== correctHashedKey) {
      throw new Error("wrong key");
    }
  }

If we're using a decently secure hash algorithm, for every bit that changes in the input, we will create (indistinguishable from) random changes throughout the output. An attacker on this system can't just continually tweak the first byte of input to figure out when a longer comparison takes place, because there's no guarantee that doing that will ever result in the first byte of the hash matching the first byte of the stored hash. And in theory that doesn't actually give away any information about the full password, because if you've matched byte one, adding the next byte to your guess changes the whole hash in unpredictable ways.

Constant time still removes this sort of avenue entirely, but this would seem to confound the attack (or at least make it much harder problem to solve).

Happy to be educated otherwise if someone that's actually an expert would like to chime in?

jackjeff · on Feb 22, 2024

Basically you can figure out the hash using the aforementioned timing attack.

You can now perform an offline attack against the password that produced the hash.

It’s like you hacked into the database and got the password hashes.

Then, you fall back to something like hashcat.

If with a bit of luck you just hashed your password with a single round of SHA and did not use something cool like Argon2id then you’re toast. If your password is simple you’re toast too.

On the other hand, if you hashed a totally random key, the safety will rely on whether the hashing algorithm itself is constant time or not. You just moved the needle.

theK · on Feb 22, 2024

> Basically you can figure out the hash using the aforementioned timing attack.

All popular hashing procedures (sha, argon, etc) to my knowledge produce high difference output for small difference inputs. This means that even with one round of sha128[0] your comparison timings turn very erratic. I won't go as far as saying that they turn random, because somebody will jump on it, but what they do achieve is render timing attacks at least impractical for the next few decades.

At least to my own knowledge.

[0] please don't use sha128!!!

Nursie · on Feb 22, 2024

> Basically you can figure out the hash using the aforementioned timing attack.

How are you going to do that?

dspillett · on Feb 22, 2024

You could use a non-constant salt for the hash, for instance use a sufficiently random number or start with one and increment it, so the comparison is hash(input+salt)==hash(secret+salt), that way each attempt gets the attacker at best a part of a different hash.

This is also assuming the secret is know in plain text – for properly stored passwords you now have hash2(hash1(input+salt1)+salt2)==hash2(prehashed-secret+salt2) with salt2 changing on each attempt.

Of course this is a more computationally expensive than a more direct time-invariant string compare such as bitwise-xor-secret-with-guess-then-bitwise-or-all-bytes-in-result-and-compare-that-to-0, so messing around with hashes & salts is probably not what you would do. The XOR-the-OR method isn't perfect, it will still at least leak the length of the secret if it is unhashed (hashing just before the comparison won't save this because the hashing itself is not time invariant).

bluGill · on Feb 22, 2024

Odds are a modern computer can crack the salt plus hash in a few seconds. Just assume secret is less than 9 chars and run a simple brute force attack.

tzs · on Feb 22, 2024

You can use a hash where the supplier of inputKey is not able to know the hash value. E.g.,

  function checkApiKey(inputKey, correctKey) {

    r = randomString()
    inputHash = hash(r + inputKey)
    correctHash = hash(r + correctKey)
    
    if (inputHash !== correctHash) {
      throw new Error("wrong key");
    }
  }

yellow_lead · on Feb 22, 2024

If you check hash(a) == hash(b), the attacker cannot produce a valid hash, because cryptographic hash functions are one way. This assumes you use a secure hash function, like not md5.

Even if they discover hash(b), they can't produce b.

jackjeff · on Feb 22, 2024

What people always forget when they make statements about hashing being one way is that these algorithms are designed to be incredibly fast.

So if what you’re hashing has low entropy/randomness (say a password) then you can totally brute force it.

In fact if there’s not a salt you can look it up in a rainbow table, saving yourself some compute. Googling a hash also works sometimes. If you want to brute force look at hashcat.

Something like SHA256 protects about 128 bits of entropy. If you go much lower than that, you’ll be brute forced. You can look at hashcat benchmarks online to estimate the time and money necessary to perform the attack. Anyone can crack an average simple password using a single round of SHA256 in no time.

bluGill · on Feb 22, 2024

For passwords you often intentionally use an algorithm that is known to be somewhat slow. You want to rate limit passwords anyway and generally you should not be getting enough logins that an inefficient algorithm matters (if it does you apply denial of service protection not fix your password algorithm).

histories · on Feb 22, 2024

If the check is not done in constant time, the attacker can discover hash(b), by trying one byte at a time (without knowing b).

Of course it's way less impactful than discovering b, but it's nonetheless a vulnerability.

ramchip · on Feb 22, 2024

H is a cryptographic hash. You observe via timing that H("123") starts with the same byte as H(key). How do you then guess the next byte?

Consider what happens to the first byte of the hash if you add a new byte to the input, e.g. "1230"...

histories · on Feb 22, 2024

I just guess the HASH's bytes, not the KEY's.

To be practical: my first guesses are:

0000...0000

1000...0000

2000...0000

...

f000...0000

I discover that c000...0000 takes slightly longer. Then, my next guesses are:

c000...0000

c100...0000

c200...0000

...

cf00...0000

Now I discovered that c700...0000 takes slightly longer...

And so on.

At the end I have the full hash.

Getting the plaintext key from that hash is a different problem.

UPDATE: damn, I understood what you mean now. Sorry. Yes, it will take a lot of computation to generate the good hashes.

ramchip · on Feb 22, 2024

The problem is that as an attacker sending requests to the server, what you control is the API key, not the hash of the API key. To test a specific hash like "c000...0000" you would need to find an input to the hash function that will yield this output. Cryptographic hash functions have a property called preimage resistance that means this is prohibitively difficult (see e.g. https://crypto.stackexchange.com/a/1174).

messe · on Feb 22, 2024

But you're not passing the hash to the service. You're passing the key. The hashing is done on a system not controlled by you, so you need to guess a plaintext that has the prefix you need.

This is not an easy problem, and in fact is what is used as proof-of-work for many cryptocurrencies.

garaetjjte · on Feb 22, 2024

>This assumes you use a secure hash function, like not md5.

There is no practical preimage attack against MD5, it would work fine in this case.

hasoleju · on Feb 22, 2024

The article briefly mentions the possibility to access secrets on a hardware level by monitoring the ressources of other threads. I have no idea if this is only possible in theory but completely impractical becauste it takes so much effort. Have there been attacks like that in the past?

mratsim · on Feb 22, 2024

Percival cache attacks on Hyperthreading.

I go over some examples here: https://github.com/mratsim/constantine/issues/358#issuecomme...

theK · on Feb 22, 2024

What Situation would a computation like in that article actually happen in?

All key checking procedures I have ever encountered either depended on hash comparisson or asymetric crypto (or a combination of them) both of which are mostly immune to pragmati timing attacks as far as I know.

cosmolev · on Feb 22, 2024

Why should it be constant-time, by the way? Wouldn't random-time suffice?

DexesTTP · on Feb 22, 2024

There's a good chunk of the article about that. "Constant-tile" is a bit of a misnomer, "secret-independent resource consumption" is a better way to say it. If you just add a "sleep(rand)" or equivalent then you can still break the secret using statistics or sidechannel attacks.

Anotheroneagain · on Feb 22, 2024

Why not lock the account for 30 minutes after 6 failed attempts?

metafunctor · on Feb 22, 2024

That would make it quite easy to maliciously lock someone out.

Instead locking accounts, appropriate throttling might be a better idea.

dspillett · on Feb 22, 2024

Throttling can still result in an effective DoS for the affected user, as they get stuck in the queue behind the brute force attempts. Throttling based on source address is not practical either given many brute force attempts use many hacked hosts as their sources to get around this very sort of limit.

bluGill · on Feb 22, 2024

Which is a big concern for internet accassable things. However if you password entry is a physical device it works great.

femto · on Feb 22, 2024

So "constant time" is another way of saying "Every secret bit should be treated the same"?