I am likewise perplexed why "cryptographic hash functions are unnecessary in the...

pvg · on Feb 26, 2017

I am likewise perplexed why "cryptographic hash functions are unnecessary in the absence of an attacker" is such a difficult concept for you to grasp.

Well, if we're going to be dicks to each other about it I'll try to explain what I think appears to be difficult for you to grasp. :) If you were throwing together something like git in a hurry you'd want a hash that

Lets you not have to think about collisions at all even if:

The collisions are by mere chance

The collisions arise by non-malicious accident

The collisions arise from malicious inputs [obviously, that implies an attacker but it also falls under 'I just don't want to think about collisions']

And right there, you grab the first non-completely-broken, not-too-giant, not-too-slow crypto hash around and get on with whatever else you had in mind. And this, to me, seems like the right call, especially since if collisions did eventually pop up, it'll probably be one generative collision and your entire system won't suddenly implode just because it exists. You'll have some time to fix stuff.

Again, to go back to the original point: Linus's argument is that cryptographic functions have unique properties that make them specifically useful in non-security contexts. He's wrong. They don't. The non-cryptographic constructions he namedropped then glossed over work fine in these contexts.

No argument there. Maybe I misunderstood 'Linus said something wrong' as 'Linus did something horribly wrong' and we're arguing over nothing?

ProblemFactory · on Feb 26, 2017

I recently worked on a project where I had to choose a hash function for non-cryptographic error checking. I investigated CRC in detail for it, even wrote two different implementations from scratch.

CRC-X is complicated to use and terrible choice.

First of all, it is not a single algorithm, it is a family of algorithms. For a CRC of size N, you have to also choose a N-bit polynomial, N-bit starting value, and N-bit output XOR value. There is no single standard, there are tens or hundreds of popular options [1]. The optimal polynomial depends on both the CRC size, and the length of the data you are feeding into it [2]. If you choose poor parameters, you will get terrible error detection characteristics (like allowing extra 0 bits at the start of data with no change to the checksum). If you choose good parameters, certain classes of common errors like zeroing out a block of more than N bits will still have much worse characteristics than 1 / 2^N random chance of collision.

Second, implementation in standard libraries is patchy. Most programming languages have some CRC32 implementation - but do not document what parameters they use, or use different notation for the same parameters (forward and reverse), or do not let you change the parameters, or do not let you change the CRC size, or all of these. There is no easy way to get a "standard CRC64 or CRC128" compatible across platforms without putting it together from github snippets and example code yourself.

Third, CRC is fast when implemented in hardware, but not that much faster than SHA-1 or SHA-512 in software. It's only 1.5-2.5x faster [3], and when you're doing one checksum per uploaded file or something, it really does not matter. It's going to be even slower when you don't have a suitable-length CRC available in optimised native code, and have to write it yourself in simple C or pure Python.

The obvious and simple solution is to pick a known popular cryptographic hash like (SHA-X) that is available in the standard library of every programming language under the exact same API and no parameters to configure, truncate its output to the digest size you want, and call it a day. No need to worry about error detection performance on specific error cases like long bursts of zeros, and you get some defense against malicious tampering as a free bonus.

[1] https://en.wikipedia.org/wiki/Cyclic_redundancy_check#Standa...

[2] http://repository.cmu.edu/cgi/viewcontent.cgi?article=1672&c...

[3] https://www.cryptopp.com/benchmarks.html

manquer · on Feb 26, 2017

Would MD5 also not satisfy all the availablity and standardization problems you mentioned just as well as SHA-x ? and also meet other non security requirements of git just as well ?

So why SHA-1 over MD5 which probably is move available and order of magnitude faster ? what problems MD5 has outside of its security that SHA-1 does not ?

pvg · on Feb 26, 2017

So why SHA-1 over MD5 which probably is more available and order of magnitude faster?

This actually ends up not being true. Try

    openssl speed md5
    openssl speed sha1

On a somewhat recent Intel CPU.

bascule · on Feb 26, 2017

I'm glad you've at least looked into this, but you have a number of things wrong:

  First of all, it is not a single algorithm, it is a family of algorithms

This argument holds for SHA1, but all modern cryptographic hash functions are also families. The SHA3 family has SHA3-224, SHA3-256, SHA3-384, SHA3-512, SHAKE128, and SHAKE256

  For a CRC of size N, you have to also choose a N-bit
  polynomial, N-bit starting value, and N-bit output XOR
  value. There is no single standard, there are tens or
  hundreds of popular options

Apparently you've never heard of CRC32C!

http://www.evanjones.ca/crc32c.html

  Most programming languages have some CRC32 implementation
  - but do not document what parameters they use, or use
  different notation for the same parameters (forward and   
  reverse), or do not let you change the parameters, or do   
  not let you change the CRC size, or all of these. There is 
  no easy way to get a "standard CRC64 or CRC128" compatible 
  across platforms without putting it together from github  
  snippets and example code yourself.

Here's CRC32C implementations in a handful of popular languages:

- C: https://software.intel.com/en-us/node/503522

- Java: http://download.java.net/java/jdk9/docs/api/java/util/zip/CR...

- JavaScript: https://www.npmjs.com/package/fast-crc32c https://www.npmjs.com/package/crc32c

- Go: https://github.com/jacobsa/gcloud/blob/master/gcs/gcsutil/cr...

- Python: https://github.com/ludios/pycrc32c

- Ruby: http://www.rubydoc.info/gems/digest-crc/0.4.1/Digest/CRC32c

And again, I'm not actually recommending git switch to CRC (as a fan of security, I would prefer they use an actual collision resistant hash function). But CRC better meets Linus's stated requirements:

  - CRC will be faster (much faster as in ~4X, your numbers seem off to me) in almost all cases
  - CRC will *not* fail to detect a bitflip, whereas there's a certain probability a random oracle-based construction will

Dylan16807 · on Feb 26, 2017

> I am likewise perplexed why "cryptographic hash functions are unnecessary in the absence of an attacker" is such a difficult concept for you to grasp.

Because you can't seem to tell the difference between "unnecessary" and "shouldn't be done".

If I build a shed then using larger screws on the door might be unnecessary but it only costs me .2% more and I know it won't fall over.

Using a recent SHA function might be overkill in a non-crypto context but it's high-quality and fast. And there's existing libraries for whatever language you want. Why the hell would anyone use CRC128? I've never even heard of it before.

It's a good hash choice, no matter how unnecessary.

bascule · on Feb 26, 2017

Clearly it's not, because it's both several times slower than the CRC family, and was known to have cryptographic flaws before git was even released. "The worst of both worlds" so to speak...