Hacker News new | past | comments | ask | show | jobs | submit login

What I don’t understand is that they doing first check for lengths

Strings are so common, it’s insane they don’t optimize




In the generated code, there is a length check. In implementation (which will often be inlined, so may vary contextually) it does a length check, and if equal does a memequal (tuned platform assembly). If that equality check fails, or the length check isn't equal, it does a runtime.cmpstring (tuned platform assembly). So, when strings are actually equal or are unequal in length, it's pretty much optimal. The bad case is when the strings are equal in length but not in content, as that can result in two scans of the strings. Still not slow in that case given that the work it's doing is being done quite efficiently, but some of that work is effectively wasted. On my local go install (2020 macbook, go) the difference is between about 2ns and 4ns for an 8-byte string that is different in the last byte. Naturally, could vary quite a bit based on cache and string size, but fast enough to not worry about until profiles show it matters.


This way of doing string comparisons does not seem to be the right one.

The obvious way of comparing strings is to do a comparison of the content for the minimum of their lengths (which should be computed in a branchless way), and that should be done in an optimized assembly loop, for which suitable SIMD instructions are available in most modern CPUs. In many C standard libraries the function memcmp already provides such an optimized implementation, which should be used for the comparison of strings of equal length.

Then, only when the result of the comparison is equal, the 2 string lengths are compared, to provide the final comparison result.

Therefore, when a good memcmp is already available, a string comparison consists of a minimum computation, a memcmp invocation and an optional integer comparison of the lengths.


Well sure, fast enough. But this is exactly why need faster machines.. Add some cpu, add some cache.

It's a 100% increase. If you're doing a lot of string parsing, it will add up. Probably not very noticeable, and maybe you'll have extra cachemisses in a critical path.

If the function shouldn't be used, why create it at all, or why not show compiler errors / runtime error


Wouldn't the normal implementation for == check lengths?

Edit: Never mind, it doesn't: https://github.com/golang/go/blob/d28bf6c9a2ea9b992796738d03...

But checking lengths doesn't really help you: it only tells you when strings are not equal, and you would still have to walk the string to see which one is larger/smaller.


> But checking lengths doesn't really help you: it only tells you when strings are not equal, and you would still have to walk the string to see which one is larger/smaller.

Only for a three-way comparison. If all you care about is equality different lengths gives a fast path for inequality.


Because string comparison is usually alphabetic. If a string is longer/shorter it says nothing about its ordering. Even if the lengths are equal, you still need to memcmp them to verify hey are indeed equal. The length doesn't actually get you new information in this context.

I'm assuming encoding and utf-8 normalization and other similar things are not in scope when answering this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: