If you're going to pick a hash specifically to avoid length-extension attacks, pick SHA-2 512/256, which isn't vulnerable to length extension attacks. Or, for that matter, pick Blake2, which is a slightly less idiosyncratic choice than SHA-3. Everyone --- including the authors of SHA-3 --- is unhappy with the SHA-3 parameters and resulting performance.
SHA-2 512/256 is much faster than SHA-3 and supported by more libraries.
The notion that by recommending SHA-2 512/256 you're setting people up to use prefix MAC SHA-2 512 or SHA-2 256 is kind of silly. You could similarly argue that by telling people to use SHA-3, you're risking that they "fall back" to SHA-2. Either way, you're calling for a specific hash.
The reality is that some people like SHA-3 because it's interesting and because its primitives can be used for a variety of non-hashing applications. That's true! Nobody is saying SHA-3 shouldn't exist. They're just saying: there's no good reason to use it in a modern application.
(If you're not clear on length-extension attacks: they're the reason we use HMAC. HMAC-SHA2 isn't vulnerable to length extension attacks; neither is HMAC-SHA1 or HMAC-MD5 --- both of which, fun fact, can't currently be attacked, despite the weakness of their underlying hash. But if you use SHA-2 512/256, you don't have to use HMAC.)
To be fair, while I do agree with you, sha-2 512/256 is an especially confusingly named hash. A _lot_ of people naturally read it as sha-2 512 or 256, which it is not, but thing/otherthing is a common enough pattern that the confusion is understandable.
sha-2 256t or something would be much clearer. As the article notes, changing it now is not trivial, but given the potential for confusion is something that might be worth doing, none the less.
Regarding MD5.
I advice people from using MD5 in new designs even if it is for HMAC-MD5. Not because that construct is broken (which it isn't), but because we add yet another dependency on an algorithm we in general consider too weak to use.
While the portable, pure C implementation of BLAKE2 that is used almost exclusively by libraries (e.g. Python, OpenSSL) is already faster than MD5...
...there are even faster implementations of BLAKE2 around. E.g. Samuel Neves' AVX2 implementation is almost 40 % faster on Haswell. libsodium ships a different AVX2 implementation (considered experimental AFAIK by its original author, so I find that a bit strange).
For this kinds of thing, I am thinking that it might be worth it to do a "web browser" mode which uses 128-bit vectors and a "web server" mode which use 256-bit.
Yes-ish. I don't sweat HMAC-MD5, nor HMAC-SHA-1. What really sucks though is when you can't use them (even though for interop reasons you must) because some stupid auditor (typically software) goes "oh no! MD5 found!! alert! alert!".
Make sure it can easily be replaced if needed though. I also use it in contexts where I need some consistent "randomization" (like your hash ring).
The thing is that even though security might not be an issue in your application, you might discover in the future that it is. For example python dict relies on hash table to store key-values, doesn't seem like something that would be concerning (you're not encrypting or keeping anything secret), but python applications then started being targeted by collision attacks. The attacks' goal was to insert many elements that hashed to the same value, which caused DoS. Their solution though was to use salt value which is different on every run.
The point is that something that might not seem like a security issue might turn out to be later on. It doesn't necessary mean we should always strong cryptographic hash functions (in many cases they might be too expensive) but at least write code in such way that we can easily replace it with something else if we need to.
BLAKE2 is faster and has features you might find useful in that sort of scenario. And doesn't forfeit security, should you decide to use this in a an unexpectedly security-sensitive application.
If the use case isn't security critical, and you know for a fact it won't become security critical later, and MD5 is convenient for whatever reason, then just use MD5.
Pardon my frustration, but: Jesus Christ, people. Why do we need to sell everyone on the fancy new thing in a hypothetical situation where it is defined not to matter?
Because now people are carrying an MD5 implementation around with them, and when the next vaguely-hash-shaped problem comes up, they're gonna reach for that without really evaluating, b/c they've used it before so and it's at the top of their mental cache and it's even more convenient now. :/
I guess it's not entirely equivalent but I'm still sorta reeling from a "why should we introduce [something like bcrypt] for password storage when we already have these other perfectly fine hash functions in use in our codebase" convo at a previous project.
No No No. Then you add another dependency on MD5. And MD5 is a very slow, complex hash function. Use MurMur, City or any of a number of really fast, non secure hash functions instead.
A situation where it is defined not to matter is unrealistic. Things you don't think are "security critical" have a tendency to unexpectedly become so; at least that's my impression. Why not use a more secure algorithm?
Exactly this. I dont like when people try to sell new things even though my use case is perfectly fine with the older solution. I understand the security implications, yet there more to this topic than just security.
If you're using it as part of an auth mechanism then sure, don't use it. If you're using it for anything else (checksum, verification etc.) where you don't expect any adversarial action then it's fine.
You should not use md5 for anything. It's completely broken and that's just asking for trouble if someone swaps out whatever your checksums are checking. It's worth the security to swallow the extra computational power. Though, if you're hosting something no one cares about, it's still worth the security to keep your small audience safe from the random malicious person you thought you'd never see.
I'd much rather have it as a well known meme that "md5 is broken, don't use it. Period." than a list of caveats like "ok for non security" or "ok with hmac".
I've seen many claims from good engineers of the sort "this design is ok because it's not for security" when it absolutely was.
A simpler rule wins, even if it costs a bit now. It'll save someone in a bigger way later.
I think the point they were making was that since we know of weaknesses in certain situations we should avoid using it even in the safe situations so that we can eventually drop it from crypto libraries all together. There are alternatives that we can all move to.
Let's say I make a hash of an asset and put it into a very long-time archive (like 100 years or whatever). What checksum algorithm should I use to be sure it will be readily available in software after 100 years? And to make the choice more realistic, let's assume that the algorithm must be sufficiently fast today (like md5 is).
> Everyone --- including the authors of SHA-3 --- is unhappy with the SHA-3 parameters and resulting performance.
Would another variant of Keccak, not the one that became SHA-3, provide more competitive results? Or does Keccak in general not have compelling properties compared to the alternatives?
Yes, there is one: KangarooTwelve, by the Keccak authors.
The problem is, there's just not a lot of reasons to use it. SHA-2 isn't broken; in fact, there are hash cryptographers who think SHA-2 may never be broken.
> there are hash cryptographers who think SHA-2 may never be broken.
It is a dangerous assertion, especially if it makes people build things that aren't future-proof enough to support more than one hash type.
Otherwise, KangarooTwelve is faster than SHA-2[0], which may be an incentive. Although it is very late compared to BLAKE2, and as a result, it is present in much fewer libraries.
That's not necessarily a dangerous assertion. SHA-2 is theoretically immune to foreseeable improvements in computational power. If you use something like SHA-512, a birthday attack will require searching through approximately more candidates than there are atoms in the observable universe. You can't meaningfully improve this by scaling up inordinate amounts of computing power.
If you can't rely on raw computational power to force obsolescence, you need either a major paradigm shift (a la quantum computing) or a clever cryptanalytic attack that bypasses the actual difficulty.
Quantum computers do not currently pose a serious threat to SHA-2. Grover's algorithm can offer a quadratic improvement to collision identification, but not an exponential one. That's impressive, but not enough on its own.
That leaves the last category, which is a clever cryptanalytic attack. This is possible, especially with novel mathematics. But it's not reliable for predicting risk.
So really, the only "future-proofing" we can do for SHA-2 is against vague improvements in the underlying math. In any practical sense of the term, it's not a dangerous assertion to claim SHA-2 may never be broken. As Thomas said, research will certainly continue on cryptographic hash functions regardless; moreover, "improvements in math might happen" is not reliable enough to calibrate forced obsolescence or regular updates to hash standards in the future.
Given the foregoing, some researchers choose to act as though SHA-2 will never be broken, because there's no meaningful way to assert that it probably will or to coordinate when or how it will, and because there are so many more productive areas of research to focus on where the current algorithms absolutely will be broken in the future. You can't productively work on future-proofing something any further once the space of coherent threats has been reduced to, "this might happen in the future somehow" without real specifics.
> "improvements in math might happen" is not reliable enough to calibrate forced obsolescence or regular updates
Definitely: it doesn't make much sense to move away from a hash that doesn't seem in danger of being broken, just because it is old — a clever new attack can be found for a newer hash, too (or for both).
Besides, usually there's a lapse of time between finding a major weakness and a practical collision.
On the other hand, when picking a hash… well, having a faster and less error-prone hash is nice.
Additionally, the primitives used in the SHA-2 core are well-enough understood and accepted that for SHA-2 to be broken, pretty much all the hashes are going to end up broken --- is how the rest of the logic would go.
Yeah, that's Aumasson. He compared it to finding that P=NP; sure it could happen, but it probably won't, and by the time we get that far "future-proofing" suddenly ceases to be a coherent concept for a lot more than just SHA-2.
Would it also be fair to say that novel math could just as easily be discovered that would break SHA-3, and so, switching to SHA-3 would not reduce risk? (Or, worse: might it increase it, since the SHA-3 constructions have been subject to less cryptographic research and so might be more likely to fall to some kind of new math?)
It's arguable that SHA-3 is better understood than SHA-2, at least in the open research community. The rationales behind every component of SHA-3 are well documented, and the design is very conservative, both in structure and number of rounds. More so than SHA-2.
It is in theory possible that a new technique could break SHA-3 and not SHA-2, but the opposite seems far more plausible.
I'm not that sure about that. In the history of cryptography almost everything was broken in the end (though some peculiar stuff took a couple hundred years), but it seems that with modern cryptography, which is very, very different from classic cryptography, breaking it is far more difficult, and the intervals at which algorithms fall apart seem to increase.
For example, DES held out for about 20-25 years or so, with some cracks showing earlier than that. AES is now ~20 years old, and we don't have a discovered a single crack in it. SHA-2 is a similar vintage and no significant issues have been discovered. The previous generation of hashes only resisted ten years or so until the first real issues showed up.
What differences does K12 have compared to SHA-3? Just less rounds or are there more changes? Something similar to the differences between BLAKE and BLAKE2 maybe?
The parameters NIST specified are just fairly conservative, for one thing (e.g. NIST, IIRC, oddly requested an increase in the number of rounds during the final stages, one of the things people waxed on about.)
Regarding another variant: yes, you can get more compelling results, and the authors of Keccak themselves have also proposed another function based on Keccak called KangarooTwelve[1]. K12 has much better performance and parallelism than SHA-3 does in software. (Of course, if you don't need most of Keccak's general properties, you're probably fine with other hashes anyway.)
As far as I can tell, the increase from 18 to 24 rounds was not NIST's call, but due to the zero-sum distinguishers found on Keccak-f, which made it no longer "hermetic": http://keccak.noekeon.org/NoteZeroSum.pdf
How is SHA-512/256 not vulnerable to length extension attacks? I've implemented the SHA family of hash functions before and the only real difference between the two is block size and truncation of output, but I don't see how that stops length extensions.
Have you tried implementing the length extension attacks?
Truncating from 512 bits to 256 bits hides 256 bits of the state from the attacker, so in order to use a length extension attack they would need to "guess" those bits.
There are two forms of length extension, one where the attacker does know those bits. SHA-512/256 doesn't protect against that. (HMAC does, I believe.)
SHA-512/256 is a variant of SHA-512 where the output is truncated to 256 bits. It does not mean "SHA-512 or SHA-256". Yes, perhaps a different symbol than "/" would be better there.....
I think our comments and discussions just add to the general confusion though, this is why I just plainly recommend SHA-3.
Now if I would personally work on a project I would think of SHAKE first, and I would seriously consider KangarooTwelve (especially for tree hashing, which makes K12 better than any other hash to hash big files).
You keep saying that here, and on Twitter, and people --- not just me --- keep telling you that if you're going to recommend a hash specifically to avoid length extension attacks, 512/256 accomplishes that without requiring the adoption of new hashing code.
I really don't understand the argument that people can safely type "SHA-3" but not "SHA-2 512/256". Would it help if we renamed SHA-2 512/256? We could call it SHA2.5: better than SHA-2, not quite SHA-3. Or SHA5: better than both!
Renaming SHA-2 512/256 sounds a lot easier than getting new hash code implemented in every library.
We should definitely rename it, because until yesterday I didn't know that "SHA-512/256" was an actual thing. I thought it was just shorthand referring to both SHA-512 and SHA-256.
Edit: this comment got 7 upvotes before the editing timeout, so I'm not the only one either.
Having read the article and the comments above yours, and being generally aware that there's some truncation thing related to SHA-256 and SHA-512 somewhere, it still took me until your comment to catch onto that.
This thread is the first time I was ever aware that "SHA-2 512/256" was actually a unique and different algorithm. I've always thought it was just the last 256 bits of SHA-512.
Also, just looking at a reasonably recent version of Python 3.6: (dir(hashlib))
So - no reference to sha2_512/256 as a unique algorithm there either. Nothing in https://docs.python.org/3/library/hashlib.html either. So, the whole SHA-2 512/256 thing isn't that straightforward.
With that said, I would hope that anyone who was doing this for a real purpose would at least take 10 minutes to read the wikipedia page. And lots of useful information on crypto.stackexchange.com as well.
"In March 2012, the standard was updated in FIPS PUB 180-4, adding the hash functions SHA-512/224 and SHA-512/256, and describing a method for generating initial values for truncated versions of SHA-512. Additionally, a restriction on padding the input data prior to hash calculation was removed, allowing hash data to be calculated simultaneously with content generation, such as a real-time video or audio feed. Padding the final data block must still occur prior to hash output"
This sounds like more than just truncating SHA-2 512. Also, the various articles on stackexchange state that there is no way to truncate SHA-2 512 and conform to SHA-2 512/256.
Finally, https://eprint.iacr.org/2010/548.pdf sounds like they really wanted you to be able to distinguish SHA-512/256 from a truncated SHA-512:
"In order for users to be able to distinguish between a SHA-512 digest which has been truncated and a SHA512/256 digest, we also offer new initialization constants, analogous to those used in SHA-384."
It's SHA-512 with a different IV (which is why you can't derive a compliant 512/256 from 512 directly) with the leftmost 256 bits taken as output. The rest of that stuff applies to all the SHA-2 hashes, I think.
None of this is security-relevant, including the IVs. You would have morally the same system if you just chopped 32 bytes off a SHA-512 hash.
> I really don't understand the argument that people can safely type "SHA-3" but not "SHA-2 512/256". Would it help if we renamed SHA-2 512/256?
The name is a serious problem. If you work with this stuff regularly enough to know the difference it might seem obvious, but that's not the case for many would-be implementors and bug-fixers.
The name makes communication problematic enough to not recommend usage.
No, it doesn't. SHA-2 is by an overwhelming margin the most popular hash used in cryptosystems today, and it is overwhelmingly used safely.
Seriously, if you care that much about the name, rename it; Even systems that use SHA-3 use HMAC (SHA-2 is perfectly safe, no matter what variant you use, in HMAC). Name the prefix MAC (an idiosyncratic construction) that uses SHA-2 something specific.
The SHA-3 advocates have already conceded this point. They don't call SHA-3's prefix MAC "prefix MAC"; they call it KMAC, which is a specific construction.
So all you have to accept is that there's SHA2MAC --- which, like KMAC, is not simply the hash function, but instead the optimal MAC construction built on the hash --- and then define it as the prefix MAC built from SHA2-512/256.
Reading your post I find myself understanding what you are saying... maybe. And that's my point. Communication is unclear.
It's hard enough to follow this topic for many people that want to keep updated. The fact that you have to spell out "please use SHA2-512/256, but not SHA2-512 and not SHA2-256" is mind-numbing.
"It's really SHA2-512/t where t has a value of 256 and that means that it's SHA2-512 truncated down which alleviates length extension attacks." Just typing that sentence gave me a headache and I have no real idea if it's right or not.
Adoption would be easy if communication was easy. As it is, it's a PITA to even have a short conversation on the subject.
I think you're confusing the SHA-2 hash core with a MAC derived from SHA-2. MACs and hashes aren't the same thing. You can build a MAC on a hash --- in fact, you can build a MAC on a hash that would be terribly insecure on its own, and some of those MACs are state-of-the art; see, for instance, all the polynomial MACs.
That's what I'm trying to articulate. Hashes aren't MACs.
This is partially my fault, for playing loose with SHA-2 512/256 vs "The Prefix MAC built on SHA-2 512/256".
Again: I think we should just call that SHA-2-MAC. It's an unambiguous name that couldn't mean anything other than "truncated SHA-2 512 used as a prefix MAC".
I think the BLAKE2 authors just choose to call it "keyed hashing" and telling people to use it as a MAC. Only somewhere in the paper where the construction is described there is an embedded clause "yes yes, that's prefix mac, hush" :)
I addressed that point in the post. Unless you replace all the implementations of SHA-2, the deed is done. People will choose SHA-256 and SHA-512 because that's what they've always known.
> without requiring the adoption of new hashing code
I should have made myself clearer: I recommend SHA-3 for new projects. I wouldn't recommend people to switch to SHA-3 if they are already using SHA-2.
I really don't agree. Rijndael and DES are different algorithm. SHA-512/256 is a made-up algorithm that you need to be aware of. Again I'm talking about "misuse" here. SHA-2 is vulnerable to misuse even if SHA-512/256 is not.
I might be a bit. Although, I don't feel like it's too far from the kind of misuse we've pointed out with nonce re-use in AES-GCM or nonce re-use in DSA/ECDSA.
Anyway that was my point! Just wanted to write it up :)
> (especially for tree hashing, which makes K12 better than any other hash to hash big files).
I am aware that other modern hash functions like BLAKE2 (and maybe BLAKE/Skein?) support tree hashing, is there any reason why K12 would be better for that?
SHA-2 512/256 is much faster than SHA-3 and supported by more libraries.
The notion that by recommending SHA-2 512/256 you're setting people up to use prefix MAC SHA-2 512 or SHA-2 256 is kind of silly. You could similarly argue that by telling people to use SHA-3, you're risking that they "fall back" to SHA-2. Either way, you're calling for a specific hash.
The reality is that some people like SHA-3 because it's interesting and because its primitives can be used for a variety of non-hashing applications. That's true! Nobody is saying SHA-3 shouldn't exist. They're just saying: there's no good reason to use it in a modern application.
(If you're not clear on length-extension attacks: they're the reason we use HMAC. HMAC-SHA2 isn't vulnerable to length extension attacks; neither is HMAC-SHA1 or HMAC-MD5 --- both of which, fun fact, can't currently be attacked, despite the weakness of their underlying hash. But if you use SHA-2 512/256, you don't have to use HMAC.)