Since I wrote a string hash code function recently[1], I was interested to see what they use. Like many, they're using[2] the venerable djb hash. That sent me down a rabbit hole where I discovered that Bernstein was 19 when he wrote about it. Wow.
I don't know the story, but the logic behind it is simple: If you want to guarantee no one depends on GetHashCode staying static between runs of an application, change it all the time.
IIRC, a few yeas ago appeared a denial of service attack, probably originally for Phyton, but it was ported son to other languages.
The idea is that the hash is good enough for normal list, but it's not a cryptographic hash and it's easy to find collisions. Then you can make a lot of requests with strings that has the same hash value. Now the hash operations are O(N) instead of O(~1) and everything is slower.
Using an unpredictable hash calculation makes this attack more difficult.
It's in a #if DEBUG statement so it would not change. Historically, even if they shipped the debug symbols, the assembly would have been built in release. Now, I suppose you could build it in debug.
The string there is in multiple areas of that class, and the same behavior is displayed for all of them. Wouldn't logic suggest everything such as above would be moved in to a constant repository for clarity and also less potential human error for future additions?
It's pretty common in C# to use Constants to represent strings. Cuts down on the repetition as you said and you get the Intellisense too. I'd be curios to know the reasoning for using the literals as well.
It's sort of surprising to me how much huger the .NET version is, in terms of code. Virtually all the lines in the Java version are API docs. The .NET version doesn't seem to have them (they must be elsewhere?) but it does have a lot more code and that code is much lower level.
Not sure what that means, if anything, but it's interesting.
That does answer an unanswered question I had on SO about string hashing. If strings are immutable, why isn't the hash code memoized? Seems like it would make HashSet/Dictionary lookups using string keys much faster.
That was my assumption, too. They do memoize the length, but I'm sure those bytes add up, having run into OutOfMemoryExceptions building huge amounts of strings before.
https://github.com/dotnet/coreclr/blob/master/src/mscorlib/s...