I am likewise perplexed why "cryptographic hash functions are unnecessary in the absence of an attacker" is such a difficult concept for you to grasp.
It is not an "odd angle". It is literally the very purpose for which they were created in the first place: to defend against attacks (preimage, collision)
If there are no attackers, the cryptography buys you nothing and merely makes the system slower.
Again, to go back to the original point: Linus's argument is that cryptographic functions have unique properties that make them specifically useful in non-security contexts. He's wrong. They don't. The non-cryptographic constructions he namedropped then glossed over work fine in these contexts.
I am likewise perplexed why "cryptographic hash functions are unnecessary in the absence of an attacker" is such a difficult concept for you to grasp.
Well, if we're going to be dicks to each other about it I'll try to explain what I think appears to be difficult for you to grasp. :) If you were throwing together something like git in a hurry you'd want a hash that
Lets you not have to think about collisions at all even if:
The collisions are by mere chance
The collisions arise by non-malicious accident
The collisions arise from malicious inputs [obviously, that implies an attacker but it also falls under 'I just don't want to think about collisions']
And right there, you grab the first non-completely-broken, not-too-giant, not-too-slow crypto hash around and get on with whatever else you had in mind. And this, to me, seems like the right call, especially since if collisions did eventually pop up, it'll probably be one generative collision and your entire system won't suddenly implode just because it exists. You'll have some time to fix stuff.
Again, to go back to the original point: Linus's argument is that cryptographic functions have unique properties that make them specifically useful in non-security contexts. He's wrong. They don't. The non-cryptographic constructions he namedropped then glossed over work fine in these contexts.
No argument there. Maybe I misunderstood 'Linus said something wrong' as 'Linus did something horribly wrong' and we're arguing over nothing?
I recently worked on a project where I had to choose a hash function for non-cryptographic error checking. I investigated CRC in detail for it, even wrote two different implementations from scratch.
CRC-X is complicated to use and terrible choice.
First of all, it is not a single algorithm, it is a family of algorithms. For a CRC of size N, you have to also choose a N-bit polynomial, N-bit starting value, and N-bit output XOR value. There is no single standard, there are tens or hundreds of popular options [1]. The optimal polynomial depends on both the CRC size, and the length of the data you are feeding into it [2]. If you choose poor parameters, you will get terrible error detection characteristics (like allowing extra 0 bits at the start of data with no change to the checksum). If you choose good parameters, certain classes of common errors like zeroing out a block of more than N bits will still have much worse characteristics than 1 / 2^N random chance of collision.
Second, implementation in standard libraries is patchy. Most programming languages have some CRC32 implementation - but do not document what parameters they use, or use different notation for the same parameters (forward and reverse), or do not let you change the parameters, or do not let you change the CRC size, or all of these. There is no easy way to get a "standard CRC64 or CRC128" compatible across platforms without putting it together from github snippets and example code yourself.
Third, CRC is fast when implemented in hardware, but not that much faster than SHA-1 or SHA-512 in software. It's only 1.5-2.5x faster [3], and when you're doing one checksum per uploaded file or something, it really does not matter. It's going to be even slower when you don't have a suitable-length CRC available in optimised native code, and have to write it yourself in simple C or pure Python.
The obvious and simple solution is to pick a known popular cryptographic hash like (SHA-X) that is available in the standard library of every programming language under the exact same API and no parameters to configure, truncate its output to the digest size you want, and call it a day. No need to worry about error detection performance on specific error cases like long bursts of zeros, and you get some defense against malicious tampering as a free bonus.
Would MD5 also not satisfy all the availablity and standardization problems you mentioned just as well as SHA-x ? and also meet other non security requirements of git just as well ?
So why SHA-1 over MD5 which probably is move available and order of magnitude faster ? what problems MD5 has outside of its security that SHA-1 does not ?
I'm glad you've at least looked into this, but you have a number of things wrong:
First of all, it is not a single algorithm, it is a family of algorithms
This argument holds for SHA1, but all modern cryptographic hash functions are also families. The SHA3 family has SHA3-224, SHA3-256, SHA3-384, SHA3-512, SHAKE128, and SHAKE256
For a CRC of size N, you have to also choose a N-bit
polynomial, N-bit starting value, and N-bit output XOR
value. There is no single standard, there are tens or
hundreds of popular options
Most programming languages have some CRC32 implementation
- but do not document what parameters they use, or use
different notation for the same parameters (forward and
reverse), or do not let you change the parameters, or do
not let you change the CRC size, or all of these. There is
no easy way to get a "standard CRC64 or CRC128" compatible
across platforms without putting it together from github
snippets and example code yourself.
Here's CRC32C implementations in a handful of popular languages:
And again, I'm not actually recommending git switch to CRC (as a fan of security, I would prefer they use an actual collision resistant hash function). But CRC better meets Linus's stated requirements:
- CRC will be faster (much faster as in ~4X, your numbers seem off to me) in almost all cases
- CRC will *not* fail to detect a bitflip, whereas there's a certain probability a random oracle-based construction will
> I am likewise perplexed why "cryptographic hash functions are unnecessary in the absence of an attacker" is such a difficult concept for you to grasp.
Because you can't seem to tell the difference between "unnecessary" and "shouldn't be done".
If I build a shed then using larger screws on the door might be unnecessary but it only costs me .2% more and I know it won't fall over.
Using a recent SHA function might be overkill in a non-crypto context but it's high-quality and fast. And there's existing libraries for whatever language you want. Why the hell would anyone use CRC128? I've never even heard of it before.
It's a good hash choice, no matter how unnecessary.
Clearly it's not, because it's both several times slower than the CRC family, and was known to have cryptographic flaws before git was even released. "The worst of both worlds" so to speak...
It is not an "odd angle". It is literally the very purpose for which they were created in the first place: to defend against attacks (preimage, collision)
If there are no attackers, the cryptography buys you nothing and merely makes the system slower.
Again, to go back to the original point: Linus's argument is that cryptographic functions have unique properties that make them specifically useful in non-security contexts. He's wrong. They don't. The non-cryptographic constructions he namedropped then glossed over work fine in these contexts.