It's a good idea, but I think it still relies on people taking the effort to be ...

samatman · on Feb 18, 2024

That's always a risk! One of the strengths of the proposal is that if a maintainer slacks on defining a solid API testset, users can submit the tests they think belong. At that point the responsibility is baked in: once a test is added, you either keep it green or bump the version, enforced by the registry.

If a maintainer staunchly refuses to define an API, that's useful information, the kind you can't get with standard SemVer, where the only mechanism is trusting strangers to do the right thing. Which, to be fair, works ok, some of the time.

withinboredom · on Feb 18, 2024

> you either keep it green or bump the version, enforced by the registry.

assert(true); is a thing. I don't think this solution would actually work. Tests might be refactored or improved and that shouldn't trigger a major release.

samatman · on Feb 18, 2024

Malicious compliance is in fact a useful escape hatch here, a maintainer can release a 1.0 where the entire API is "test 1 + 1 == 2". That, too, is useful information.

But the package registry checks all API tests against the last version and rejects the registration if they change at all. That can be relaxed for non-semantic parts of the test, like a description of what the test means, but none of the code is allowed to change. It would be better if this were based on the AST, so that whitespace tweaks don't trigger a build failure, that's practical to achieve in most languages.

Refactoring an API test isn't worth losing the guarantees a system like this provides, and it's only the API tests which come with any restrictions, maintainers may do as they please with the rest of the test suite. An improved API test has to be provided as a new test. Part of the proposal is that users can refer to API tests in their own code, as a way to determine if tests they rely on break in a major release, so the tests need unique names, which means they can be rearranged in the file. It also means that if there's a typo in the name, or the name sucks, well, you're stuck with that until the next major release, and even then it goes on to live in infamy, forever, in the obsolete-API portion of the test suite. Not ideal, but it can't be avoided.

withinboredom · on Feb 18, 2024

hmm. Interesting. So, most likely, "stable" software will likely release with a major version somewhere in the hundreds instead of 1.0? Since initial development usually means lots of breaking changes while details are discovered/built, I can't see 1.0 having any useful meaning.

samatman · on Feb 19, 2024

It's not all tests, it's just the API tests. I'm not sure why that was unclear to so many people. You can have hundreds of tests, thousands even, only the API tests are special.

If there's no stable behavior because the software is still at that stage of development, it's 0.x software still. That's true in SemVer as well as this refinement of it.

Contrariwise, if you think software is ready for 1.0 and you can't come up with any tests which display guaranteed behavior which won't change without that major version bump, then no, it's not ready.

withinboredom · on Feb 19, 2024

That’s what I’m saying though. At some point, you have to write those tests and there will be bugs. There will be things that aren’t ideal. It’s like the first time you write the CI/CD pipelines and you have to commit 100 stupid things directly to main to test it and make sure it works.

samatman · on Feb 19, 2024

Yes, and this approach gives a clear path to 1.0. One might hope that tests are being written in tandem with the code, in the 0.x epoch those are just tests.

During the ramp-up to 1.0, release candidates and such, some of the tests, the ones which evidently demonstrate the expected behavior of the API, get moved to the API testset. Since it's still zero season, this imposes no restrictions. The tests could have bugs, the code could have bugs, the API tests might need tweaking, the API can still change, all of this is fine.

Then you release 1.0 and the API tests have to stay green until 2.0. I think we have different estimates of how likely it is that it would make sense to change those tests and not call that a breaking change, because that's what those tests are for. They are a subset of the (often much larger) test suite, designed specifically on the premise that if these behaviors change, it will break the API. I don't think it's hard to write those, I've never found that difficult in my own code. If you can't make a few assertions about the behavior of a package, does it even make sense to describe it as having a stable API? What would that even mean?

A realistic version of this system would have to allow preludes to the API tests, and those can change. Setup might involve internal details, mockups, other things which it isn't prudent to lock in. That theoretically lets maintainers change behavior while pretending they didn't (dumb example, changing the meaning of `lowercase` to `uppercase` and replacing the setup string with digits), but the point of this isn't to stop people from being jerks, that isn't possible.

There aren't restrictions on what the API tests can be, either. Someone with a philosophical objection to all of this can engage in malicious compliance and write "assert 1 + 1 = 2" to pacify the package manager. A minimal API test set which is still in the spirit of the concept is to assert that the exported/public names in the package are defined. That already provides a hard guarantee, after all, and if it's a subset of the exports, that shows which ones are intended to be stable and which ones are experimental.

Users can build on that by writing tests which use those exported values, demonstrating expected behavior, there's no need or expectation that every possible corner is covered by the test suite right at 1.0. Maintainers can add those tests to the API set if they agree that the assertions should be invariant.

Part of why I like this idea is there's a reluctance, which you're showing, to make major version releases. It's stressful for the maintainers and the users. In this system, the broken tests stay in the suite, and get moved out to the version 1 set. Users can assert the tests in the suite which fix behavior they rely on, and automated tooling can tell them if it's definitely not safe to upgrade (nothing can assure that any upgrade is completely safe, certainly not Scout's-honor SemVer). Making major releases should be less fraught. It's sadly common for package maintainers to make changes which are actually breaking, and claim that's not what happened, so they can keep a 1 in the front of their version string. That's a perverse incentive.

The worst case scenario which seems to concern you is what? The code is good but the tests are bad? Ok, write a sheepish NEWS.md assuring everyone that nothing is really changing but the test suite, and cut another major version. Laugh about it later.

withinboredom · on Feb 20, 2024

Making 0.* special, doesn't actually fix the issue I'm describing, but just pushes it to 2.0.

Example: now that 1.0 is released, I want to add two new massively breaking changes. The team opens two PRs. The first one merges and bumps the version to 2.0, then the next one merges, and it gets bumped to 3.0. That sounds ridiculous.