It's not all tests, it's just the API tests. I'm not sure why that was unclear t...

withinboredom · on Feb 19, 2024

That’s what I’m saying though. At some point, you have to write those tests and there will be bugs. There will be things that aren’t ideal. It’s like the first time you write the CI/CD pipelines and you have to commit 100 stupid things directly to main to test it and make sure it works.

samatman · on Feb 19, 2024

Yes, and this approach gives a clear path to 1.0. One might hope that tests are being written in tandem with the code, in the 0.x epoch those are just tests.

During the ramp-up to 1.0, release candidates and such, some of the tests, the ones which evidently demonstrate the expected behavior of the API, get moved to the API testset. Since it's still zero season, this imposes no restrictions. The tests could have bugs, the code could have bugs, the API tests might need tweaking, the API can still change, all of this is fine.

Then you release 1.0 and the API tests have to stay green until 2.0. I think we have different estimates of how likely it is that it would make sense to change those tests and not call that a breaking change, because that's what those tests are for. They are a subset of the (often much larger) test suite, designed specifically on the premise that if these behaviors change, it will break the API. I don't think it's hard to write those, I've never found that difficult in my own code. If you can't make a few assertions about the behavior of a package, does it even make sense to describe it as having a stable API? What would that even mean?

A realistic version of this system would have to allow preludes to the API tests, and those can change. Setup might involve internal details, mockups, other things which it isn't prudent to lock in. That theoretically lets maintainers change behavior while pretending they didn't (dumb example, changing the meaning of `lowercase` to `uppercase` and replacing the setup string with digits), but the point of this isn't to stop people from being jerks, that isn't possible.

There aren't restrictions on what the API tests can be, either. Someone with a philosophical objection to all of this can engage in malicious compliance and write "assert 1 + 1 = 2" to pacify the package manager. A minimal API test set which is still in the spirit of the concept is to assert that the exported/public names in the package are defined. That already provides a hard guarantee, after all, and if it's a subset of the exports, that shows which ones are intended to be stable and which ones are experimental.

Users can build on that by writing tests which use those exported values, demonstrating expected behavior, there's no need or expectation that every possible corner is covered by the test suite right at 1.0. Maintainers can add those tests to the API set if they agree that the assertions should be invariant.

Part of why I like this idea is there's a reluctance, which you're showing, to make major version releases. It's stressful for the maintainers and the users. In this system, the broken tests stay in the suite, and get moved out to the version 1 set. Users can assert the tests in the suite which fix behavior they rely on, and automated tooling can tell them if it's definitely not safe to upgrade (nothing can assure that any upgrade is completely safe, certainly not Scout's-honor SemVer). Making major releases should be less fraught. It's sadly common for package maintainers to make changes which are actually breaking, and claim that's not what happened, so they can keep a 1 in the front of their version string. That's a perverse incentive.

The worst case scenario which seems to concern you is what? The code is good but the tests are bad? Ok, write a sheepish NEWS.md assuring everyone that nothing is really changing but the test suite, and cut another major version. Laugh about it later.

withinboredom · on Feb 20, 2024

Making 0.* special, doesn't actually fix the issue I'm describing, but just pushes it to 2.0.

Example: now that 1.0 is released, I want to add two new massively breaking changes. The team opens two PRs. The first one merges and bumps the version to 2.0, then the next one merges, and it gets bumped to 3.0. That sounds ridiculous.