I’d argue that any string comparison which does not take into account collation ...

eru · on Oct 27, 2022

For some applications (eg sticking stuff in an ordered data structure), you just need any consistent ordering, but don't care too much about exactly which one.

cookiengineer · on Oct 27, 2022

I've been developing my own version string parser for a couple weeks now, in golang.

It's ridiculous to what lengths you have to go to understand which part of a string comes earlier or later.

Simple example: semantic versioning allows "1.2.3alpha" and also 1.2.3-beta", but which one comes first now...

- Is 1.2.3 > 1.2.3omega?

- Is 1.2.3 > 1.2.3beta?

- Is 1.2.3gamma > 1.2.3?

In the Linux world it gets even funnier cause they invented SONAME fields that reflect breaking API changes instead of forcing packages to comply with semantic versioning syntax. Oftentimes there is a package version of e.g. 0.4.7 that has an SONAME of 12.7 on the filesystem.

Add to that the ~prerelease suffix syntax in Debian based distros which are maintained downstream, and all the +buildid or .commithash or -revision123 suffixes and you've landed in string comparison hell.

When I started I would have never guessed that this is such a complex problem to solve in golang.

crymer11 · on Oct 27, 2022

> semantic versioning allows "1.2.3alpha"

Perhaps I missed it, but I thought [Semantic Versioning](https://semver.org/) required a “-“ between the patch number and a prerelease identifier since at least version 1.0.0 (with 1.0.0-beta allowing a “.” instead of a “-“), no?

oefrha · on Oct 27, 2022

Yes, version string comparison is hard because people have all sorts of unstandardized ideas about version strings. Not sure why you seem to believe there’s golang-specific difficulty here.

kibwen · on Oct 27, 2022

> semantic versioning allows "1.2.3alpha" and also 1.2.3-beta"

No, according to the spec, the hyphen is mandatory: "A pre-release version MAY be denoted by appending a hyphen and a series of dot separated identifiers immediately following the patch version."

cookiengineer · on Oct 27, 2022

You interpreted your own copy/pasted answer wrongly.

MAY has not the same meaning as MUST. MAY is optional, MUST is mandatory.

> https://semver.org/#spec-item-9

tedunangst · on Oct 27, 2022

You may denote a prerelease version. Or not. But you can't just append any crap you like and call it the prerelease version.

wongarsu · on Oct 27, 2022

The wording is ambiguous, but the BNF later in the spec [1] agrees with your interpretation. Valid version numbers are three numbers separated by dots, followed by either a minus and dot-separated pre-release versions; or a plus and dot-separated build identifiers.

1: https://semver.org/#backusnaur-form-grammar-for-valid-semver...

kibwen · on Oct 27, 2022

No, that is a misreading. The "MAY" indicates that the prerelease identifier itself is optional. However, if you do append one, it must include a leading hyphen.

alcover · on Oct 27, 2022

  - Is 1.2.3 > 1.2.3omega? No
  - Is 1.2.3 > 1.2.3beta?  No
  - Is 1.2.3gamma > 1.2.3? Yes

I don't see ambiguity in your examples.

cookiengineer · on Oct 27, 2022

> - Is 1.2.3gamma > 1.2.3? Yes

I wrote this example, because I knew the answer. And your interpretation (the same as my initial one) is wrong :)

> Pre-release versions have a lower precedence than the associated normal version.

[1] https://semver.org/#spec-item-9

tsimionescu · on Oct 27, 2022

1.2.3gamma is not a pre-release version, it is a malformed version string (assuming SemVer). A proper SemVer is something like [0-9]+[.][0-9]+[.][0-9]+(-[0-9a-zA-Z]+)?([+][0-9a-zA-Z]+)?

> 2. A normal version number MUST take the form X.Y.Z where X, Y, and Z are non-negative integers, and MUST NOT contain leading zeroes. X is the major version, Y is the minor version, and Z is the patch version.

Nezghul · on Oct 27, 2022

Well, if You want to compare strings according to some rules then... write a custom comparator.

sethammons · on Oct 27, 2022

I'd use capture regex to get the first three numerals and capture the remaining string. If the remaining string exists, you can easily ignore the expected leading dash and your malformed semver suffixes will work and those can be trivially compared/sorted.

What about Go makes this different? That's how I'd solve this is any language

daviddever23box · on Oct 27, 2022

Agreed - especially when it is potentially unknown what might follow the first three numerals. Any performance hit would be mitigated by the corresponding reduction of the downstream logic.

codedokode · on Oct 27, 2022

Then you can use bytewise comparison instead of alphabetic comparison.

snotrockets · on Oct 27, 2022

Assuming you normalized the strings before.

simiones · on Oct 27, 2022

Why would that matter, if you only want some well-defined order?

snotrockets · on Oct 27, 2022

Because with (Unicode) strings, "\u006e\u0303" is defined to be equal to "\u00f1", for example. If you'd do bytewise comparison, as the above comment suggested, you may not reach the same result ¯\_(ツ)_/¯

simiones · on Oct 28, 2022

Whether those two strings are or are not equivalent depends on the context. If we're assuming (as the GP did) a very generic context where we simply want to store arbitrary strings in a sorted data structure, then there is no reason to assume they are supposed to be interpreted as Unicode.

For a simple example, perhaps this is a list of strings that require Unicode normalization to be properly interpreted as human text that you are storing into a TreeMap for efficient retrieval. When you are adding "\u00f1" to the list, you wouldn't want the collection to say that it's already there because it already had "\u006e\u0303".

dhosek · on Oct 28, 2022

Then in that case, treat the string as an array of bytes.