Quick link to what I think is the most interesting class in the CLR: https://git...

munificent · on Feb 3, 2015

Since I wrote a string hash code function recently[1], I was interested to see what they use. Like many, they're using[2] the venerable djb hash. That sent me down a rabbit hole where I discovered that Bernstein was 19 when he wrote about it. Wow.

[1]: https://github.com/munificent/wren/blob/master/src/wren_valu...

[2]: https://github.com/dotnet/coreclr/blob/master/src/mscorlib/s...

simfoo · on Feb 3, 2015

From their GetHashCode():

// We want to ensure we can change our hash function daily.

// This is perfectly fine as long as you don't persist the

// value from GetHashCode to disk or count on String A

// hashing before string B. Those are bugs in your code.

hash1 ^= ThisAssembly.DailyBuildNumber;

I'd love to hear the story behind this one :D

daeken · on Feb 3, 2015

I don't know the story, but the logic behind it is simple: If you want to guarantee no one depends on GetHashCode staying static between runs of an application, change it all the time.

evincarofautumn · on Feb 3, 2015

But now I can circumvent this hack by xoring with ThisAssembly.DailyBuildNumber again. :)

rplnt · on Feb 4, 2015

I think the story is even simpler than that as the code in question is prefaced with: #if DEBUG

The shipped product doesn't include this "randomness".

MichaelGG · on Feb 3, 2015

Also is it not for hash collision protection, which can end up hurting the runtime of many algorithms, causing a DoS?

plorkyeran · on Feb 3, 2015

No, it has no effect on hash collisions. Two hashes that are equal before being xored with the daily build number will still be equal afterwards.

porges · on Feb 3, 2015

It won't help much there as the number is static for a particular build.

munificent · on Feb 3, 2015

The hash randomization for security is above this part of the code.

gus_massa · on Feb 3, 2015

IIRC, a few yeas ago appeared a denial of service attack, probably originally for Phyton, but it was ported son to other languages.

The idea is that the hash is good enough for normal list, but it's not a cryptographic hash and it's easy to find collisions. Then you can make a lot of requests with strings that has the same hash value. Now the hash operations are O(N) instead of O(~1) and everything is slower.

Using an unpredictable hash calculation makes this attack more difficult.

omaranto · on Feb 3, 2015

Your typo can be easily fixed by a Python one-liner:

    >>> (lambda w:w[2:]+w[:2])(''.join(sorted("Phyton",key=lambda c:math.sin(ord(c)^50))))
    'Python'

euroclydon · on Feb 3, 2015

It's in a #if DEBUG statement so it would not change. Historically, even if they shipped the debug symbols, the assembly would have been built in release. Now, I suppose you could build it in debug.

Aleman360 · on Feb 3, 2015

The Reference Source site is much easier to navigate. Method names are hyperlinks:

http://referencesource.microsoft.com/#mscorlib/system/string...

sekasi · on Feb 4, 2015

Admittedly, I've never been big in C#, but I'm AMAZED by how much repetition string comparison there is.

IE: Environment.GetResourceString("ArgumentOutOfRange_Index")

The string there is in multiple areas of that class, and the same behavior is displayed for all of them. Wouldn't logic suggest everything such as above would be moved in to a constant repository for clarity and also less potential human error for future additions?

Rapzid · on Feb 4, 2015

It's pretty common in C# to use Constants to represent strings. Cuts down on the repetition as you said and you get the Intellisense too. I'd be curios to know the reasoning for using the literals as well.

mike_hearn · on Feb 4, 2015

And for comparison:

http://grepcode.com/file/repository.grepcode.com/java/root/j...

It's sort of surprising to me how much huger the .NET version is, in terms of code. Virtually all the lines in the Java version are API docs. The .NET version doesn't seem to have them (they must be elsewhere?) but it does have a lot more code and that code is much lower level.

Not sure what that means, if anything, but it's interesting.

DoggettCK · on Feb 3, 2015

That does answer an unanswered question I had on SO about string hashing. If strings are immutable, why isn't the hash code memoized? Seems like it would make HashSet/Dictionary lookups using string keys much faster.

munificent · on Feb 3, 2015

Presumably because it's not worth the memory hit to store the hash.

DoggettCK · on Feb 3, 2015

That was my assumption, too. They do memoize the length, but I'm sure those bytes add up, having run into OutOfMemoryExceptions building huge amounts of strings before.

reddiric · on Feb 3, 2015

Strings know their length in the CLR because they are represented as BSTRs

http://blogs.msdn.com/b/ericlippert/archive/2011/07/19/strin...

This lets them interoperate with OLE Automation.

porges · on Feb 4, 2015

You have to store the length, C# strings can contain '\0' (although the hashing code doesn't take this into account!)