It's more of a dig on people who claim MD5 is okay because it's in Schneier's book. Not so much a dig on Schneier himself.
And yes. I understand MD5 etag headers are common. I even understand that MD5 still passes the avalanche test -- Robshaw's observations on MD5 in 1996 are just as valid today.
But according to Sasaki & Aoki, //both// collisions and second pre-image calculation are faster than brute force. Maybe MD5 //shouldn't// be used if all you need is avalanche.
It turns out that SHA256 has both collision and second pre-image calculation resistance //as well as// avalanche effect. Maybe use that one instead.
I don't remember the last time I saw anyone use MD5 for cryptography. It's still used a lot for data integrity, because it still works perfectly well for that and it's faster than the SHA family (when there's no hardware implementation).
Though be careful about the use of the phrase "data integrity" -- comparing two files on a file-system by their MD5 hash is probably fine, but comparing the serialization of two PDUs over the wire based on their MD5 hash may be problematic.
I think I understand your meaning to be MD5 certainly still exhibits an avalanche effect; changing a single bit in the input changes about half the bits in the output. And if you trust the way you retrieve the hashed data and the hash (like it's on a local hard drive) then yes, it's certainly acceptable for that use. But collisions and second pre-image generation being faster than brute force are why people generally don't want to use it (MD5) when it's use spans trust domains.
My point is:
a. Don't use MD5 just because Bruce Schneier published a popular book that said it was okay RIGHT BEFORE all the research damning it came out. (Personally... I think Bruce should publish a third edition of this text expressly to remove the bit about MD5 being okay. I cannot tell you how many hours of billable time I've wasted explaining to software engineers that no... MD5 is not recommended for use even though at one time it was considered acceptable. And if you know not to use MD5, you're not the software engineer I'm talking about.)
b. You can use SHA256 and get avalanche, collision resistance AND second pre-image generation resistance. (Pretty sure you also get 1st pre-image generation resistance, but I haven't scanned the literature for that in a while.)
And while I'm thinking about it, let me add these points:
c. There are probably better hashing algorithms than MD5 for use with a hash map/table/tree.
d. If you're interested in how MD5 works, I recommend expanding the scope a bit and study Merkle-Damgård generally. Why MD5 has problems and other hash functions that make use of the Merkle-Damgård construction don't (or have different problems, or the same problems at different amounts of input) is pretty interesting.
And yes, if you happen to have a MD5 hardware accelerator or petabytes of data and MD5 hashes already, it's hard to change that overnight.
>comparing the serialization of two PDUs over the wire based on their MD5
I'm not sure what you mean by this, but this:
>people generally don't want to use it (MD5) when it's use spans trust domains.
is exactly what I mean by "cryptography". I.e. guarding against intentional tampering. Are there a lot of people using it for this purpose? I don't remember seeing one.
>c. There are probably better hashing algorithms than MD5 for use with a hash map/table/tree.
For sure. Cryptographic functions (even obsolete ones) are almost always overkill and too slow for general data structures. Only use them if you can't find something more suitable for your data.
>And yes, if you happen to have a MD5 hardware accelerator or petabytes of data and MD5 hashes already, it's hard to change that overnight.
I was actually talking about SHA-256 acceleration, since I just saw like an hour ago that recent Intel CPUs have it. If your CPU has such instructions, by all means use it instead of software MD5, if all you need is data integrity.
Moving from MD5 to SHA256 means we just about triple the amount of time it takes to generate a hash.
But I suggest there are few applications where this speed improvement justifies the confusion I've seen in junior engineers who believe MD5 is okay because a. they use MD5SUM and b. Bruce Schneier said it was okay.
Don't get me wrong. I trust that //YOU// will know not depend on a MD5 hash for anything where a bad guy can modify content over the wire. But... I'm going to go out on a limb and guess you're somewhat experienced. I worry about the kids who without the benefit of experience re-enact scenes from the cryptography edition of Lord of the Flies.
But... if you know what you're doing... sure... use MD5... There's certainly no way your code will ever be used by a less experienced engineer, right?
I tend to prefer b2sum (BLAKE2), as it's both fast and (at the time of writing this, AFAIK) secure. There are certainly situations where md5 is good enough, but yeah, I feel safer just using blake2 and not having to spend brain cycles thinking whether the hash needs to be cryptographically secure or not, and risk making the wrong judgement.
Tried with a 3.5GB Ubuntu .iso file, results are similar though I also tried b3sum and that was even faster(just 1.8s vs. 19s for sha256sum & 6.7s for md5sum). So performance is definitely not an argument for md5.
And yes. I understand MD5 etag headers are common. I even understand that MD5 still passes the avalanche test -- Robshaw's observations on MD5 in 1996 are just as valid today.
But according to Sasaki & Aoki, //both// collisions and second pre-image calculation are faster than brute force. Maybe MD5 //shouldn't// be used if all you need is avalanche.
It turns out that SHA256 has both collision and second pre-image calculation resistance //as well as// avalanche effect. Maybe use that one instead.