If you're going to pick a hash specifically to avoid length-extension attacks, p...

jfindley · on June 2, 2017

To be fair, while I do agree with you, sha-2 512/256 is an especially confusingly named hash. A _lot_ of people naturally read it as sha-2 512 or 256, which it is not, but thing/otherthing is a common enough pattern that the confusion is understandable.

sha-2 256t or something would be much clearer. As the article notes, changing it now is not trivial, but given the potential for confusion is something that might be worth doing, none the less.

dom0 · on June 2, 2017

I've seen it written as "SHA-2 512-256", which is a bit less confusing.

JoachimS · on June 2, 2017

Regarding MD5. I advice people from using MD5 in new designs even if it is for HMAC-MD5. Not because that construct is broken (which it isn't), but because we add yet another dependency on an algorithm we in general consider too weak to use.

Do you agreee with that advice?

oconnor663 · on June 2, 2017

BLAKE2 is faster than MD5. I think that means there's no longer any reason to use MD5 in new applications, if you have any choice.

dom0 · on June 2, 2017

While the portable, pure C implementation of BLAKE2 that is used almost exclusively by libraries (e.g. Python, OpenSSL) is already faster than MD5...

...there are even faster implementations of BLAKE2 around. E.g. Samuel Neves' AVX2 implementation is almost 40 % faster on Haswell. libsodium ships a different AVX2 implementation (considered experimental AFAIK by its original author, so I find that a bit strange).

yuhong · on June 2, 2017

For this kinds of thing, I am thinking that it might be worth it to do a "web browser" mode which uses 128-bit vectors and a "web server" mode which use 256-bit.

cryptonector · on June 2, 2017

Yes-ish. I don't sweat HMAC-MD5, nor HMAC-SHA-1. What really sucks though is when you can't use them (even though for interop reasons you must) because some stupid auditor (typically software) goes "oh no! MD5 found!! alert! alert!".

iRobbery · on June 2, 2017

Heh heard that one, and the always very convincing answer in a discussion to do or not do certain things...

"if it is on the checklist, then.. it is on the checklist"

cryptonector · on June 2, 2017

There is lots of junk on those checklists. Like "must implement n-strikes you're locked", which is idiotic in many contexts.

azinman2 · on June 2, 2017

For us who don't know better, could you explain why MD5 or SHA1 are ok in an HMAC context?

tptacek · on June 2, 2017

Because a collision in MD5 doesn't give you a second preimage of an MD5 hash.

tptacek · on June 2, 2017

Definitely don't use MD5!

StreamBright · on June 2, 2017

What if the use case is not security sensitive, I just want to implement a distributed hash ring for example?

takeda · on June 2, 2017

Make sure it can easily be replaced if needed though. I also use it in contexts where I need some consistent "randomization" (like your hash ring).

The thing is that even though security might not be an issue in your application, you might discover in the future that it is. For example python dict relies on hash table to store key-values, doesn't seem like something that would be concerning (you're not encrypting or keeping anything secret), but python applications then started being targeted by collision attacks. The attacks' goal was to insert many elements that hashed to the same value, which caused DoS. Their solution though was to use salt value which is different on every run.

The point is that something that might not seem like a security issue might turn out to be later on. It doesn't necessary mean we should always strong cryptographic hash functions (in many cases they might be too expensive) but at least write code in such way that we can easily replace it with something else if we need to.

stouset · on June 2, 2017

BLAKE2 is faster and has features you might find useful in that sort of scenario. And doesn't forfeit security, should you decide to use this in a an unexpectedly security-sensitive application.

btrask · on June 2, 2017

If the use case isn't security critical, and you know for a fact it won't become security critical later, and MD5 is convenient for whatever reason, then just use MD5.

Pardon my frustration, but: Jesus Christ, people. Why do we need to sell everyone on the fancy new thing in a hypothetical situation where it is defined not to matter?

ben0x539 · on June 2, 2017

Because now people are carrying an MD5 implementation around with them, and when the next vaguely-hash-shaped problem comes up, they're gonna reach for that without really evaluating, b/c they've used it before so and it's at the top of their mental cache and it's even more convenient now. :/

I guess it's not entirely equivalent but I'm still sorta reeling from a "why should we introduce [something like bcrypt] for password storage when we already have these other perfectly fine hash functions in use in our codebase" convo at a previous project.

JoachimS · on June 5, 2017

No No No. Then you add another dependency on MD5. And MD5 is a very slow, complex hash function. Use MurMur, City or any of a number of really fast, non secure hash functions instead.

andrewflnr · on June 2, 2017

A situation where it is defined not to matter is unrealistic. Things you don't think are "security critical" have a tendency to unexpectedly become so; at least that's my impression. Why not use a more secure algorithm?

StreamBright · on June 2, 2017

Exactly this. I dont like when people try to sell new things even though my use case is perfectly fine with the older solution. I understand the security implications, yet there more to this topic than just security.

JoachimS · on June 5, 2017

Then you are using a _very_ slow hash function for which there are many algorithms better than MD5.

bradleyjg · on June 2, 2017

If the application isn't security sensitive why not a faster non-cryptographic hash like farmhash?

takeda · on June 2, 2017

Maybe because it is less popular, this is first time I hear about it. Probably it's also not as widely available.

StreamBright · on June 2, 2017

I agree. There is also siphash that we use.

dom0 · on June 2, 2017

These are often not packaged, so less available. Though for some applications you wouldn't want a library anyway.

(Also, xxhash)

JoachimS · on June 5, 2017

I totally hear you. ;-)

taneq · on June 2, 2017

It depends what you're using it for, no?

If you're using it as part of an auth mechanism then sure, don't use it. If you're using it for anything else (checksum, verification etc.) where you don't expect any adversarial action then it's fine.

helthanatos · on June 2, 2017

You should not use md5 for anything. It's completely broken and that's just asking for trouble if someone swaps out whatever your checksums are checking. It's worth the security to swallow the extra computational power. Though, if you're hosting something no one cares about, it's still worth the security to keep your small audience safe from the random malicious person you thought you'd never see.

arnarbi · on June 2, 2017

I'd much rather have it as a well known meme that "md5 is broken, don't use it. Period." than a list of caveats like "ok for non security" or "ok with hmac".

I've seen many claims from good engineers of the sort "this design is ok because it's not for security" when it absolutely was.

A simpler rule wins, even if it costs a bit now. It'll save someone in a bigger way later.

JoachimS · on June 5, 2017

Good points.

wybiral · on June 2, 2017

I think the point they were making was that since we know of weaknesses in certain situations we should avoid using it even in the safe situations so that we can eventually drop it from crypto libraries all together. There are alternatives that we can all move to.

kozak · on June 2, 2017

Let's say I make a hash of an asset and put it into a very long-time archive (like 100 years or whatever). What checksum algorithm should I use to be sure it will be readily available in software after 100 years? And to make the choice more realistic, let's assume that the algorithm must be sufficiently fast today (like md5 is).

dom0 · on June 2, 2017

The answer is that you don't want a hash, you want a solution that is ArchiSig or TR-ESOR (or something similar) compliant.

yakult · on June 2, 2017

So what's the right algorithm to use instead of md5 if I just want a fast checksum that's widely supported?

snakeanus · on June 2, 2017

BLAKE2b or BLAKE2bp should be a good choice for that. Or just CRC if you care only about unintentional changes.

svenfaw · on June 2, 2017

SHA256

dom0 · on June 2, 2017

If SHA-2, do use SHA-512, since we are living in a 64 bit world.

JoachimS · on June 5, 2017

SHA-512/256

baby · on June 2, 2017

Yeah, that's a good argument to move to a stronger hash function for HMAC.

JoshTriplett · on June 2, 2017

> Everyone --- including the authors of SHA-3 --- is unhappy with the SHA-3 parameters and resulting performance.

Would another variant of Keccak, not the one that became SHA-3, provide more competitive results? Or does Keccak in general not have compelling properties compared to the alternatives?

tptacek · on June 2, 2017

Yes, there is one: KangarooTwelve, by the Keccak authors.

The problem is, there's just not a lot of reasons to use it. SHA-2 isn't broken; in fact, there are hash cryptographers who think SHA-2 may never be broken.

espadrine · on June 2, 2017

> there are hash cryptographers who think SHA-2 may never be broken.

It is a dangerous assertion, especially if it makes people build things that aren't future-proof enough to support more than one hash type.

Otherwise, KangarooTwelve is faster than SHA-2[0], which may be an incentive. Although it is very late compared to BLAKE2, and as a result, it is present in much fewer libraries.

[0]: https://twitter.com/KeccakTeam/status/834789451708628995

dsacco · on June 2, 2017

That's not necessarily a dangerous assertion. SHA-2 is theoretically immune to foreseeable improvements in computational power. If you use something like SHA-512, a birthday attack will require searching through approximately more candidates than there are atoms in the observable universe. You can't meaningfully improve this by scaling up inordinate amounts of computing power.

If you can't rely on raw computational power to force obsolescence, you need either a major paradigm shift (a la quantum computing) or a clever cryptanalytic attack that bypasses the actual difficulty.

Quantum computers do not currently pose a serious threat to SHA-2. Grover's algorithm can offer a quadratic improvement to collision identification, but not an exponential one. That's impressive, but not enough on its own.

That leaves the last category, which is a clever cryptanalytic attack. This is possible, especially with novel mathematics. But it's not reliable for predicting risk.

So really, the only "future-proofing" we can do for SHA-2 is against vague improvements in the underlying math. In any practical sense of the term, it's not a dangerous assertion to claim SHA-2 may never be broken. As Thomas said, research will certainly continue on cryptographic hash functions regardless; moreover, "improvements in math might happen" is not reliable enough to calibrate forced obsolescence or regular updates to hash standards in the future.

Given the foregoing, some researchers choose to act as though SHA-2 will never be broken, because there's no meaningful way to assert that it probably will or to coordinate when or how it will, and because there are so many more productive areas of research to focus on where the current algorithms absolutely will be broken in the future. You can't productively work on future-proofing something any further once the space of coherent threats has been reduced to, "this might happen in the future somehow" without real specifics.

espadrine · on June 2, 2017

> "improvements in math might happen" is not reliable enough to calibrate forced obsolescence or regular updates

Definitely: it doesn't make much sense to move away from a hash that doesn't seem in danger of being broken, just because it is old — a clever new attack can be found for a newer hash, too (or for both).

Besides, usually there's a lapse of time between finding a major weakness and a practical collision.

On the other hand, when picking a hash… well, having a faster and less error-prone hash is nice.

Overall, I feel lucky to have such great choice.

andrewflnr · on June 2, 2017

True. But part of the question is whether to spend the engineering effort to make sure it's even possible to move away.

tptacek · on June 2, 2017

Additionally, the primitives used in the SHA-2 core are well-enough understood and accepted that for SHA-2 to be broken, pretty much all the hashes are going to end up broken --- is how the rest of the logic would go.

(This is, I think, JP Aumasson's argument.)

dsacco · on June 2, 2017

Yeah, that's Aumasson. He compared it to finding that P=NP; sure it could happen, but it probably won't, and by the time we get that far "future-proofing" suddenly ceases to be a coherent concept for a lot more than just SHA-2.

pvg · on June 2, 2017

I always wonder when this pops up, are there cryptographers who don't think that. Not necessarily forever but within, say, 30 years.

apendleton · on June 2, 2017

Would it also be fair to say that novel math could just as easily be discovered that would break SHA-3, and so, switching to SHA-3 would not reduce risk? (Or, worse: might it increase it, since the SHA-3 constructions have been subject to less cryptographic research and so might be more likely to fall to some kind of new math?)

nincompope · on June 2, 2017

It's arguable that SHA-3 is better understood than SHA-2, at least in the open research community. The rationales behind every component of SHA-3 are well documented, and the design is very conservative, both in structure and number of rounds. More so than SHA-2.

It is in theory possible that a new technique could break SHA-3 and not SHA-2, but the opposite seems far more plausible.

dom0 · on June 2, 2017

I'm not that sure about that. In the history of cryptography almost everything was broken in the end (though some peculiar stuff took a couple hundred years), but it seems that with modern cryptography, which is very, very different from classic cryptography, breaking it is far more difficult, and the intervals at which algorithms fall apart seem to increase.

For example, DES held out for about 20-25 years or so, with some cracks showing earlier than that. AES is now ~20 years old, and we don't have a discovered a single crack in it. SHA-2 is a similar vintage and no significant issues have been discovered. The previous generation of hashes only resisted ten years or so until the first real issues showed up.

tptacek · on June 2, 2017

I don't think we're at any risk of research on hashing stopping.

snakeanus · on June 2, 2017

What differences does K12 have compared to SHA-3? Just less rounds or are there more changes? Something similar to the differences between BLAKE and BLAKE2 maybe?

baby · on June 2, 2017

It has half the rounds and a tree hashing mode by default, which makes it extremely fast when hashing big files (yes even in software). Check https://cryptologie.net/article/393/kangarootwelve/

aseipp · on June 2, 2017

The parameters NIST specified are just fairly conservative, for one thing (e.g. NIST, IIRC, oddly requested an increase in the number of rounds during the final stages, one of the things people waxed on about.)

Regarding another variant: yes, you can get more compelling results, and the authors of Keccak themselves have also proposed another function based on Keccak called KangarooTwelve[1]. K12 has much better performance and parallelism than SHA-3 does in software. (Of course, if you don't need most of Keccak's general properties, you're probably fine with other hashes anyway.)

[1] http://keccak.noekeon.org/KangarooTwelve.pdf

nincompope · on June 2, 2017

As far as I can tell, the increase from 18 to 24 rounds was not NIST's call, but due to the zero-sum distinguishers found on Keccak-f, which made it no longer "hermetic": http://keccak.noekeon.org/NoteZeroSum.pdf

cryptonector · on June 2, 2017

SHAKE is Keccack with parameters that are comparable to SHA-2's, and so performs better.

kobeya · on June 2, 2017

How is SHA-512/256 not vulnerable to length extension attacks? I've implemented the SHA family of hash functions before and the only real difference between the two is block size and truncation of output, but I don't see how that stops length extensions.

nialo · on June 2, 2017

Have you tried implementing the length extension attacks?

Truncating from 512 bits to 256 bits hides 256 bits of the state from the attacker, so in order to use a length extension attack they would need to "guess" those bits.

kobeya · on June 2, 2017

There are two forms of length extension, one where the attacker does know those bits. SHA-512/256 doesn't protect against that. (HMAC does, I believe.)

tptacek · on June 2, 2017

dom0 · on June 2, 2017

SHA-512/256 is a variant of SHA-512 where the output is truncated to 256 bits. It does not mean "SHA-512 or SHA-256". Yes, perhaps a different symbol than "/" would be better there.....

helper · on June 2, 2017

You can't add more data once you've truncated.

baby · on June 2, 2017

I think our comments and discussions just add to the general confusion though, this is why I just plainly recommend SHA-3.

Now if I would personally work on a project I would think of SHAKE first, and I would seriously consider KangarooTwelve (especially for tree hashing, which makes K12 better than any other hash to hash big files).

tptacek · on June 2, 2017

You keep saying that here, and on Twitter, and people --- not just me --- keep telling you that if you're going to recommend a hash specifically to avoid length extension attacks, 512/256 accomplishes that without requiring the adoption of new hashing code.

I really don't understand the argument that people can safely type "SHA-3" but not "SHA-2 512/256". Would it help if we renamed SHA-2 512/256? We could call it SHA2.5: better than SHA-2, not quite SHA-3. Or SHA5: better than both!

Renaming SHA-2 512/256 sounds a lot easier than getting new hash code implemented in every library.

sp332 · on June 2, 2017

We should definitely rename it, because until yesterday I didn't know that "SHA-512/256" was an actual thing. I thought it was just shorthand referring to both SHA-512 and SHA-256.

Edit: this comment got 7 upvotes before the editing timeout, so I'm not the only one either.

ben0x539 · on June 2, 2017

Having read the article and the comments above yours, and being generally aware that there's some truncation thing related to SHA-256 and SHA-512 somewhere, it still took me until your comment to catch onto that.

ghshephard · on June 2, 2017

This thread is the first time I was ever aware that "SHA-2 512/256" was actually a unique and different algorithm. I've always thought it was just the last 256 bits of SHA-512.

Also, just looking at a reasonably recent version of Python 3.6: (dir(hashlib))

"'blake2b', 'blake2s', 'md5', 'new', 'pbkdf2_hmac', 'sha1', 'sha224', 'sha256', 'sha384', 'sha3_224', 'sha3_256', 'sha3_384', 'sha3_512', 'sha512', 'shake_128', 'shake_256']"

So - no reference to sha2_512/256 as a unique algorithm there either. Nothing in https://docs.python.org/3/library/hashlib.html either. So, the whole SHA-2 512/256 thing isn't that straightforward.

With that said, I would hope that anyone who was doing this for a real purpose would at least take 10 minutes to read the wikipedia page. And lots of useful information on crypto.stackexchange.com as well.

tptacek · on June 2, 2017

It's not a unique and different algorithm. It is simply a truncated SHA-2 512 hash.

ghshephard · on June 2, 2017

From: https://en.wikipedia.org/wiki/SHA-2

"In March 2012, the standard was updated in FIPS PUB 180-4, adding the hash functions SHA-512/224 and SHA-512/256, and describing a method for generating initial values for truncated versions of SHA-512. Additionally, a restriction on padding the input data prior to hash calculation was removed, allowing hash data to be calculated simultaneously with content generation, such as a real-time video or audio feed. Padding the final data block must still occur prior to hash output"

This sounds like more than just truncating SHA-2 512. Also, the various articles on stackexchange state that there is no way to truncate SHA-2 512 and conform to SHA-2 512/256.

Finally, https://eprint.iacr.org/2010/548.pdf sounds like they really wanted you to be able to distinguish SHA-512/256 from a truncated SHA-512:

"In order for users to be able to distinguish between a SHA-512 digest which has been truncated and a SHA512/256 digest, we also offer new initialization constants, analogous to those used in SHA-384."

tptacek · on June 2, 2017

It's SHA-512 with a different IV (which is why you can't derive a compliant 512/256 from 512 directly) with the leftmost 256 bits taken as output. The rest of that stuff applies to all the SHA-2 hashes, I think.

None of this is security-relevant, including the IVs. You would have morally the same system if you just chopped 32 bytes off a SHA-512 hash.

briansmith · on June 2, 2017

NIST did specify a SHA-512/256 that has its own IV, like SHA-384.

ktta · on June 2, 2017

Even SHA-224 has it's own IV, while it's internals are exactly the same as SHA-256

tptacek · on June 2, 2017

Touche!

Steeeve · on June 2, 2017

> I really don't understand the argument that people can safely type "SHA-3" but not "SHA-2 512/256". Would it help if we renamed SHA-2 512/256?

The name is a serious problem. If you work with this stuff regularly enough to know the difference it might seem obvious, but that's not the case for many would-be implementors and bug-fixers.

The name makes communication problematic enough to not recommend usage.

tptacek · on June 2, 2017

No, it doesn't. SHA-2 is by an overwhelming margin the most popular hash used in cryptosystems today, and it is overwhelmingly used safely.

Seriously, if you care that much about the name, rename it; Even systems that use SHA-3 use HMAC (SHA-2 is perfectly safe, no matter what variant you use, in HMAC). Name the prefix MAC (an idiosyncratic construction) that uses SHA-2 something specific.

The SHA-3 advocates have already conceded this point. They don't call SHA-3's prefix MAC "prefix MAC"; they call it KMAC, which is a specific construction.

So all you have to accept is that there's SHA2MAC --- which, like KMAC, is not simply the hash function, but instead the optimal MAC construction built on the hash --- and then define it as the prefix MAC built from SHA2-512/256.

Steeeve · on June 2, 2017

Reading your post I find myself understanding what you are saying... maybe. And that's my point. Communication is unclear.

It's hard enough to follow this topic for many people that want to keep updated. The fact that you have to spell out "please use SHA2-512/256, but not SHA2-512 and not SHA2-256" is mind-numbing.

"It's really SHA2-512/t where t has a value of 256 and that means that it's SHA2-512 truncated down which alleviates length extension attacks." Just typing that sentence gave me a headache and I have no real idea if it's right or not.

Adoption would be easy if communication was easy. As it is, it's a PITA to even have a short conversation on the subject.

tptacek · on June 2, 2017

I think you're confusing the SHA-2 hash core with a MAC derived from SHA-2. MACs and hashes aren't the same thing. You can build a MAC on a hash --- in fact, you can build a MAC on a hash that would be terribly insecure on its own, and some of those MACs are state-of-the art; see, for instance, all the polynomial MACs.

That's what I'm trying to articulate. Hashes aren't MACs.

This is partially my fault, for playing loose with SHA-2 512/256 vs "The Prefix MAC built on SHA-2 512/256".

Again: I think we should just call that SHA-2-MAC. It's an unambiguous name that couldn't mean anything other than "truncated SHA-2 512 used as a prefix MAC".

dom0 · on June 2, 2017

I think the BLAKE2 authors just choose to call it "keyed hashing" and telling people to use it as a MAC. Only somewhere in the paper where the construction is described there is an embedded clause "yes yes, that's prefix mac, hush" :)

baby · on June 2, 2017

I addressed that point in the post. Unless you replace all the implementations of SHA-2, the deed is done. People will choose SHA-256 and SHA-512 because that's what they've always known.

> without requiring the adoption of new hashing code

I should have made myself clearer: I recommend SHA-3 for new projects. I wouldn't recommend people to switch to SHA-3 if they are already using SHA-2.

tptacek · on June 2, 2017

This is a little like saying that if you told someone to use Rijndael they might instead use DES.

baby · on June 2, 2017

I really don't agree. Rijndael and DES are different algorithm. SHA-512/256 is a made-up algorithm that you need to be aware of. Again I'm talking about "misuse" here. SHA-2 is vulnerable to misuse even if SHA-512/256 is not.

dom0 · on June 2, 2017

> SHA-512/256 is a made-up algorithm

It's a published FIPS.

baby · on June 2, 2017

didn't know that!

tptacek · on June 2, 2017

I think you're reaching here.

baby · on June 2, 2017

I might be a bit. Although, I don't feel like it's too far from the kind of misuse we've pointed out with nonce re-use in AES-GCM or nonce re-use in DSA/ECDSA.

Anyway that was my point! Just wanted to write it up :)

tracked24x7 · on June 2, 2017

Are your respective companies (yours and OP's) business competitors?

baby · on June 2, 2017

WAT. No, we just both like to argue on twitter and here :]

throw86045651 · on June 2, 2017

They are both professional consultants in this space.

masklinn · on June 2, 2017

Why not recommend SHA384 which AFAIK is exactly a truncated SHA512 without the "name collision" with SHA256? Or SHA224 which is the same from SHA256?

tptacek · on June 2, 2017

To maximize both performance and the number of truncated bits, which is the factor mitigating length extension.

masklinn · on June 2, 2017

> both performance

Surely SHA384 is no slower than SHA512/256 since both need to compute a SHA512 hash before truncation?

> and the number of truncated bits, which is the factor mitigating length extension.

Is there really a meaningful difference in mitigation between truncating 25% and 50% of the original hash?

snakeanus · on June 2, 2017

> (especially for tree hashing, which makes K12 better than any other hash to hash big files).

I am aware that other modern hash functions like BLAKE2 (and maybe BLAKE/Skein?) support tree hashing, is there any reason why K12 would be better for that?

baby · on June 2, 2017

No not really. I cited BLAKE2bp or BLAKE2sp in the post.