EffVer: Version your code by the effort required to upgrade

samatman · on Feb 17, 2024

The only issue with SemVer is that it's a social contract. There's an available solution to this: make it a technical contract instead.

Most languages these days have a built-in test suite. They can define "no breaking changes" so that it actually means something. Have a set of tests called API. During a major release cycle, you can add tests, but you can't change the tests you have, and the tests have to keep passing. The package registry can run those tests, and if any fail, you don't get to post a minor version release with that code.

This goes from an underdefined "our API will have no breaking changes" to "this is the guaranteed behavior of the API, and cannot change until the major version number is bumped". If a downstream user of the package sees some behavior they want added to the API contract, they can write a test and submit it as a PR, and that test can go into the next release if the maintainers agree that it's a stable behavior which they don't intend to change.

When you move from e.g. 1.0 to 2.0, the tests which now fail are moved to "1.0 API", but they're never removed. No test which is ever in an "API" testset can ever be removed, the package manager enforces this. Provide some mechanism so users of the package can annotate API tests in packages they use as a part of their own test suite, so that when they upgrade, those test failing is an immediate message about what no longer works. If you only rely on behavior which is in common from 1.0 to 2.0, it should be safe to upgrade.

No more taking people's word for it when they say "no breaking changes", no more bikeshedding about what is or isn't a breaking change, just... tests. End of.

jchanimal · on Feb 18, 2024

I describe this idea of an executable feature spec in my roadmap blog post from earlier this year. I agree it’s a great way to think about it.

https://fireproof.storage/posts/roadmap-to-1.0/

We’d define 1.0 in exactly the way you describe, where we can add tests for 1.1 but not remove them without triggering 2.0

abathur · on Feb 18, 2024

I fiddled around a little with the idea of test-driven versioning a while back. Maybe you'd find it interesting. https://github.com/abathur/tdver

I did draft a git-based implementation (https://github.com/abathur/tdverpy), but it just obviously can't be as compelling as one that was part of a language's native tooling/ecosystem could be.

samatman · on Feb 18, 2024

This is quite similar to what I have in mind, yes. Great minds think alike!

I do think that having an API subset of tests is better than basing the system on all tests. Packages should have as many tests as possible, I frequently write tests which I know will break when I do further work on the code, so that I notice when it happens, and because if it happens accidentally it's probably a bug. Wouldn't want a versioning system to have a side effect of making people reluctant to write a test, because it would commit them to the results. I envision tests migrating from the rest of the suite to the API set over time.

I do like that your system completely specifies the meaning of minor and patch numbers, and wonder if there's a way to tweak my proposal so that it does so as well.

abathur · on Feb 18, 2024

I wouldn't try to insist anything I sketched out is the ~right approach. I was just trying to imagine one path through the possibility space, and then reason a bit about what kinds of information it might be able to convey.

There are almost certainly better paths through, and I suspect the idea isn't broadly usable without different kinds of tests that have different kinds of rules. It's probably not helpful to have to increment your major version just because you use snapshot testing and some dependency update causes a trivial shift in the output.

I also fiddled around a little with an idea I called "earmarks", which are basically just version-bound tests. You could use these to express the idea that, say, test_x shouldn't pass or fail until the version is >= a.b.c.

This would make it easy to deprecate an API today and go ahead and ship a test that requires the API to be present and functioning until the next major release but not after. Or, for example, to make a commitment device that asserts the project will hit some doc/lint/typing goals by some clear point.

Since it's an open-ended mechanism, I imagine something like it is the lower-friction way for a real project to explore applying these concepts without full toolchain support.

bbkane · on Feb 18, 2024

It's a good idea, but I think it still relies on people taking the effort to be responsible, which just doesn't seem to work long term...

samatman · on Feb 18, 2024

That's always a risk! One of the strengths of the proposal is that if a maintainer slacks on defining a solid API testset, users can submit the tests they think belong. At that point the responsibility is baked in: once a test is added, you either keep it green or bump the version, enforced by the registry.

If a maintainer staunchly refuses to define an API, that's useful information, the kind you can't get with standard SemVer, where the only mechanism is trusting strangers to do the right thing. Which, to be fair, works ok, some of the time.

withinboredom · on Feb 18, 2024

> you either keep it green or bump the version, enforced by the registry.

assert(true); is a thing. I don't think this solution would actually work. Tests might be refactored or improved and that shouldn't trigger a major release.

samatman · on Feb 18, 2024

Malicious compliance is in fact a useful escape hatch here, a maintainer can release a 1.0 where the entire API is "test 1 + 1 == 2". That, too, is useful information.

But the package registry checks all API tests against the last version and rejects the registration if they change at all. That can be relaxed for non-semantic parts of the test, like a description of what the test means, but none of the code is allowed to change. It would be better if this were based on the AST, so that whitespace tweaks don't trigger a build failure, that's practical to achieve in most languages.

Refactoring an API test isn't worth losing the guarantees a system like this provides, and it's only the API tests which come with any restrictions, maintainers may do as they please with the rest of the test suite. An improved API test has to be provided as a new test. Part of the proposal is that users can refer to API tests in their own code, as a way to determine if tests they rely on break in a major release, so the tests need unique names, which means they can be rearranged in the file. It also means that if there's a typo in the name, or the name sucks, well, you're stuck with that until the next major release, and even then it goes on to live in infamy, forever, in the obsolete-API portion of the test suite. Not ideal, but it can't be avoided.

withinboredom · on Feb 18, 2024

hmm. Interesting. So, most likely, "stable" software will likely release with a major version somewhere in the hundreds instead of 1.0? Since initial development usually means lots of breaking changes while details are discovered/built, I can't see 1.0 having any useful meaning.

samatman · on Feb 19, 2024

It's not all tests, it's just the API tests. I'm not sure why that was unclear to so many people. You can have hundreds of tests, thousands even, only the API tests are special.

If there's no stable behavior because the software is still at that stage of development, it's 0.x software still. That's true in SemVer as well as this refinement of it.

Contrariwise, if you think software is ready for 1.0 and you can't come up with any tests which display guaranteed behavior which won't change without that major version bump, then no, it's not ready.

withinboredom · on Feb 19, 2024

That’s what I’m saying though. At some point, you have to write those tests and there will be bugs. There will be things that aren’t ideal. It’s like the first time you write the CI/CD pipelines and you have to commit 100 stupid things directly to main to test it and make sure it works.

samatman · on Feb 19, 2024

Yes, and this approach gives a clear path to 1.0. One might hope that tests are being written in tandem with the code, in the 0.x epoch those are just tests.

During the ramp-up to 1.0, release candidates and such, some of the tests, the ones which evidently demonstrate the expected behavior of the API, get moved to the API testset. Since it's still zero season, this imposes no restrictions. The tests could have bugs, the code could have bugs, the API tests might need tweaking, the API can still change, all of this is fine.

Then you release 1.0 and the API tests have to stay green until 2.0. I think we have different estimates of how likely it is that it would make sense to change those tests and not call that a breaking change, because that's what those tests are for. They are a subset of the (often much larger) test suite, designed specifically on the premise that if these behaviors change, it will break the API. I don't think it's hard to write those, I've never found that difficult in my own code. If you can't make a few assertions about the behavior of a package, does it even make sense to describe it as having a stable API? What would that even mean?

A realistic version of this system would have to allow preludes to the API tests, and those can change. Setup might involve internal details, mockups, other things which it isn't prudent to lock in. That theoretically lets maintainers change behavior while pretending they didn't (dumb example, changing the meaning of `lowercase` to `uppercase` and replacing the setup string with digits), but the point of this isn't to stop people from being jerks, that isn't possible.

There aren't restrictions on what the API tests can be, either. Someone with a philosophical objection to all of this can engage in malicious compliance and write "assert 1 + 1 = 2" to pacify the package manager. A minimal API test set which is still in the spirit of the concept is to assert that the exported/public names in the package are defined. That already provides a hard guarantee, after all, and if it's a subset of the exports, that shows which ones are intended to be stable and which ones are experimental.

Users can build on that by writing tests which use those exported values, demonstrating expected behavior, there's no need or expectation that every possible corner is covered by the test suite right at 1.0. Maintainers can add those tests to the API set if they agree that the assertions should be invariant.

Part of why I like this idea is there's a reluctance, which you're showing, to make major version releases. It's stressful for the maintainers and the users. In this system, the broken tests stay in the suite, and get moved out to the version 1 set. Users can assert the tests in the suite which fix behavior they rely on, and automated tooling can tell them if it's definitely not safe to upgrade (nothing can assure that any upgrade is completely safe, certainly not Scout's-honor SemVer). Making major releases should be less fraught. It's sadly common for package maintainers to make changes which are actually breaking, and claim that's not what happened, so they can keep a 1 in the front of their version string. That's a perverse incentive.

The worst case scenario which seems to concern you is what? The code is good but the tests are bad? Ok, write a sheepish NEWS.md assuring everyone that nothing is really changing but the test suite, and cut another major version. Laugh about it later.

withinboredom · on Feb 20, 2024

Making 0.* special, doesn't actually fix the issue I'm describing, but just pushes it to 2.0.

Example: now that 1.0 is released, I want to add two new massively breaking changes. The team opens two PRs. The first one merges and bumps the version to 2.0, then the next one merges, and it gets bumped to 3.0. That sounds ridiculous.

dmurray · on Feb 17, 2024

I don't know why this got downvoted; it's at the very least an interesting proposal. Would love to hear a critique arguing that it's a terrible idea.

Existing package managers could even implement it in a completely backwards- compatible way: if you as a package maintainer don't care for it, you simply never add "API tests".

chacham15 · on Feb 18, 2024

Its an idealistic view which will almost certainly fail. Test suites, as much as we like to hope that they reflect real usage, mostly dont. A simple example: if function a gets changed from o(n) to o(n^2) but otherwise behaves identically, most test suites will still pass, but if a user has that function in its own inner loop you can go from o(n^2) to o(n^2^2) which can definitely break a lot of things (simple example: transaction was holding lock for too long and so the transaction was aborted). Being able to catch the above is a high bar for a test suite which I'm fairly confident most test suites are way below that.

samatman · on Feb 18, 2024

You're allowed to fix bugs introduced by new releases, you know. It's called a patch release?

Is your claim that a release which introduces bad algorithmic complexity requires a major release to fix in semver? Who thinks this?

pgphn · on Feb 18, 2024

Right. Theres a difference between, “oh no, sorry, WE should fix that in the next patch” vs, “oh no, sorry, YOU should accommodate this incompatibility with a change on your side.”

falserum · on Feb 18, 2024

I would not call it terrible, but I got a “silver bullet” vibe, which it is definitely not.

1. For a library there is API and there are implementation details. What if test depended on implementation detail?

2. What if tests had undisputable bug?

3. Test refactoring requires major release now?

4. Realistically test suite will have some execution paths not covered.

I like the idea of running same tests over multiple versions, to observe changes. But I disagree that it would automate semver. (Maybe in very limited subset of cases)

P.S. Not an actual downvoter, but if I would have downvoted, these would have been the reasons.

samatman · on Feb 18, 2024

> For a library there is API and there are implementation details. What if test depended on implementation detail?

If the test is in the API testset, it's API. If it isn't, it's an implementation detail.

> What if tests had undisputable bug?

If it's in the API testset, time for a major version bump. If not, fix it.

> Test refactoring requires major release now?

Only if you're refactoring the API, as defined by the API testset, thereby producing a breaking change.

> Realistically test suite will have some execution paths not covered.

Doesn't matter. If the behavior isn't in the API testset, it's not a part of the API.

> But I disagree that it would automate semver.

The point isn't to automate semver, I'm not even sure what that would mean. It's to define it, in a useful and objective way.

dxdm · on Feb 18, 2024

The point of the criticism is that defining the version and what constitutes a breaking change like this will still leave people with unexpected breakage in the real world. What you've said so far has not really addressed that point straight on, which might be the reason for the comments and downvotes, I presume.

I don't think your proposed scheme needs to be perfect in that regard, but acknowledging the concern and at least putting it in perspective would probably help.

samatman · on Feb 18, 2024

I've no idea what downvotes you're referring to, I'm well into the black on that post. ¯\_(ツ)_/¯

SemVer is just a pinky-swear not to break people's code. In the real world, people's code breaks anyway, and then you get an argument about what's API, and expected behavior, and so on, and so forth.

What I'm proposing is simply to replace the pinky promise with tests. From some of the other comments, I think this point may have been missed: it isn't every test in your test suite, it's the ones marked "API", only.

This is a strict improvement over social-contract SemVer in two ways: one is that the package manager won't let the maintainers break the API tests without a major version bump. The other is that, if you, as a user, are unsure if some behavior is part of the stable API, you can write and submit a test to that package. If that test is accepted, great: that behavior now cannot change without a major version bump, because, again, the package manager will not bundle the package if that test breaks. Furthermore, even on a major version bump, it is instantly clear if that test is still valid, or not, you can just check before upgrading. If they don't accept the PR, you know that it isn't considered part of the API, so you add the test to your own test suite, so that at least you know quickly what broke if they change it.

dxdm · on Feb 18, 2024

> I've no idea what downvotes you're referring to, I'm well into the black on that post.

As you should be, it's a great contribution. I was referring to the downvotes mentioned in a comment further up.

I agree that what you propose is an improvement, but it can be misunderstood to claim that it can prevent _any_ real-world breakage. There will always be aspects not covered by tests and which other people still rely on.

I've had this experience with API contract tests between systems. Despite covering a lot of details and preventing deployments that failed these tests, we would occasionally run into problems where passing changes would break stuff in production. There was always an area of uncodified assumptions, and for a case of tens of different clients, whereas public libraries can have millions. So, I believe this is also applicable to your proposed solution.

You can argue that your solution significantly shrinks this area of uncertainty while also _defining_ it, which helps when reasoning about what you can depend on - and I agree. But it does not eliminate the gap, and this is what people were pointing out.

I was just a little frustrated that the discussion even went there, because I didn't think you were even claiming what they were arguing against. That was happening because I think you left a gap by not addressing it clearly, and wanted to point it out, because people seemed to be taking past each other.

samatman · on Feb 18, 2024

> I've had this experience with API contract tests between systems. Despite covering a lot of details and preventing deployments that failed these tests, we would occasionally run into problems where passing changes would break stuff in production. There was always an area of uncodified assumptions, and for a case of tens of different clients, whereas public libraries can have millions. So, I believe this is also applicable to your proposed solution.

It's only intended as an improvement to practice. SQLite has a test suite which exhaustively tests every single branch of the code, which D. Richard Hipp wrote on contract to one of his clients. It takes more than a day to run. We might all aspire to such a level of professionalism, but realistically, most programs and libraries will fall short of glory here. And while this exceptional test harness has in fact limited the scope of SQLite bugs a great deal, it hasn't eliminated them entirely.

So with a TestVer system, or whatever we want to call it, there will still be breaking updates, there will still be bugs. But it provides a mechanism for defining the invariant behaviors of a major release number.

It's possible some of the early respondents thought that I considered an appropriate response to a minor change which breaks downstream code to be "lol, too bad". That might happen sometimes, minus the lol we may hope, but more often the response should be more proactive: a revert, adding a test which clarifies the new behavior, something.

The best part of this system is that users of a package can write additional tests of that package and submit them as PRs, if there's some behavior they see the package exhibiting which doesn't appear to be in the test suite. This is easier by far than making changes to the package itself, just add the tests to your own suite, if they pass, make a fork of the repo, add those test to their API suite, submit a PR. Whether they accept the patch or not, you have it in your own test suite, so you'll be informed immediately if a later release breaks it.

ambicapter · on Feb 18, 2024

It looks to me like he neatly addressed all concerns that were brought up, even if one does not agree with the solutions he proposes. I don't see any lack of acknowledgment on OP's part.

zamadatix · on Feb 17, 2024

EffVer ignores that different users will experience different amounts of pain, not solving the complaint it has about SemVer. If 99% of your users need to do nothing but 1% of your users are going to need significant effort to migrate (say, retiring a couple version old schema most users never even used) then macro/meso/micro all fail to communicate the expected amount of pain. Similarly, if you take the attitude every minor patched bug could have users then micro isn't communicating anything different than it would have in semver anyways.

If you want to communicate impact it might make more sense to add on to semver in some way with a 2 axis "amount of effort" and "likelihood it impacts you" as say "-b7" or something. That said, start trying to include so much information in the version string and eventually you'll just end up with an compressed version of the release notes and not a version number.

gtirloni · on Feb 17, 2024

There is no replacement for reading changelogs or release notes.

Maybe if people did that for their dependencies, we wouldn't have certain software stacks with thousands of them for a simple helloworld-ish backend.

I'm of the opinion that SemVer or any other version arrangement is not to be trusted blindly. When I see a minor version upgrade, it gives me some hope I can upgrade without much trouble but I've been burned too many times to go in blind like that.

codetrotter · on Feb 18, 2024

> That said, start trying to include so much information in the version string and eventually you'll just end up with a compressed version of the release notes and not a version number.

I hear ya. So what what we should be doing is to make a 4096-dimensional vector based on an embedding created from our release notes. And use that as the release version :D

medstrom · on Feb 18, 2024

What is "based on an embedding"?

KTibow · on Feb 18, 2024

There's a thing called an embedding that comes from an embedding model. You put text in and you get a vector (embedding) out. Since the vector represents the text, it's usually used for stuff like search.

thrwwycbr · on Feb 18, 2024

I have a solution for that.

SocialVer:

- upvote or downvote major release changes

- emojis to communicate the level of upgrade pain

- emojis to communicate the level of disaster after upgrading

dbrueck · on Feb 18, 2024

A long time ago I gave up on trying to convey much meaning in version numbers, and have used YYYYMMDDBB (BB = build number for that day, starting at 0) for well over a decade, and I love it.

There are many 'pros' to this approach: it's stupidly simple and tools can autogenerate it easily, it's trivially sortable, it tells you how long ago the release happened, but above all it intentionally conveys nothing about your perception of the magnitude of changes and therefore is never misleading. The real meaning is conveyed via release notes: high level changes (with emphasis on any breaking changes) followed by a detailed changelog.

I understand the desire to convey more meaning in the version number itself, but every alternative approach I've tried always falls apart in some way and/or becomes more trouble than it's worth, especially when it's a marketing person who wants version numbers to get bigger faster or a "humble" team member who is anxious to call this the 1.0 release.

Stuff like SemVer seems like a good idea initially, but even with a rigorous test suite there are cases where a bug fix or new feature aren't quite as backwards compatible as intended, so trust in the version number only goes so far. Or it tends to give undo emphasis, e.g. in this release you are pushing out several backwards compatible bug fixes and you are finally pulling a feature you deprecated a long time ago. You have good evidence that nobody has used this feature in years, and for all intents and purposes this is a very small patch release, but you instead have to bump the major version, implying that it's a big release.

Something like EffVer is an interesting approach, but when it ends up being inaccurate for you (i.e. when a supposedly painless upgrade is anything but), then all it has done is pour salt on the wound.

rvdginste · on Feb 17, 2024

I still consider semver better. When it is used correctly, the version number gives a clear indication of what kind of changes to expect when upgrading. Obviously this is done to the best of knowledge of the author and might not always be 100% correct.

Either way, the amount of work to do for an upgrade depends on which parts of the product you are using and whether those parts have any changes in the new version. For this reason, most projects also have a changelog which gives you more detailed information about the upgrade. When preparing for an upgrade it is advised to read the changelog.

GuB-42 · on Feb 17, 2024

Isn't it essentially generalized semver?

The more breaking changes are, the more effort it is required to take them into account. Semver only applies to APIs. Effver could apply to UIs too, but for APIs, it would be similar, just not as well defined (because it is more general).

krainboltgreene · on Feb 17, 2024

Any versioning mechanic that allows for a `0.X.Y` has ultimately failed it's users. There are libraries on almost every package manager that have millions of downloads, thousands of production users, but still pretend they are `0.X.Y` as if that means anything. I mean just think about what this sentence:

    zero version still denotes a codebase under development

A human wrote that and said "Yeah, this makes sense to me." All code is under development until it's not.

jacoblambda · on Feb 18, 2024

I don't think that's really necessarily fair.

A major version of zero means pre-release code. i.e. a codebase under active development (with the implicit assumption that there will likely be major breaking changes).

A major version of zero just means "I am not committing to a stable API until 1.0" which is a completely fair stance. I'm not going to write code that's very clearly unstable and in active churn and try to pretend it's stable. I'm also not going to keep around a legacy API at that point yet.

Compare that to a standard bump in major version (i.e. 1.0 to 2.0). In this case there is an expectation of a migration path and in all likelihood a versioned legacy API that'll stick around so that users can slowly migrate across the breaking changes between API versions.

Frankly I'm not going to commit to doing that for 0.X.Y/indev projects.

krainboltgreene · on Feb 19, 2024

> i.e. a codebase under active development (with the implicit assumption that there will likely be major breaking changes).

You have just described all actively written software as "major zero". This is why it's a silly concept.

jacoblambda · on Feb 20, 2024

If you aren't committing to an API or ABI then why are you even releasing with SemVar. If you are releasing a piece of software without a stable interface just release it with date as a version (ex: yyyy.mm.dd).

SemVar makes sense but every piece of software using it doesn't. Not every piece of software makes an interface commitment. Those that do should use SemVar, the rest should just use the date or some monotonically increasing number.

medstrom · on Feb 18, 2024

All my software so far is 0.X.Y. I'm thinking about just dropping the 0.

Though that'd communicate something totally different from EffVer... Bumps in my X are not "macro effort".

bdjsiqoocwk · on Feb 17, 2024

I didn't understand the objections to semver. Can someone give me a specific example of semver failing? I can't think of anything other than the package publisher choosing the increment wrong, and in that case it's not the versioning mechanism's failure, but the publisher's.

cryptonector · on Feb 18, 2024

At first glance, I love the thought.

In reality projects/vendors often make versioning decisions for marketing reasons. If you add a ton of killer features with no backwards incompatibility and trivial upgrade path, you might still bump the major version number even though normally that would denote radical backwards-incompatible change.

The need for marketing versioning will not go away, so maybe what we need is an upgrade quantifier modifier to the version number.

E.g., 8.0.0 can be a major functionality release, and 8.0.0-ez can be a major functionality release that has an easy upgrade path while 8.0.0-hd can be a major release that has a difficult (hd == headache) upgrade path.

Karellen · on Feb 17, 2024

* Not related to or affiliated with The EFF https://www.eff.org/

(I realise that TLAs have a limited namespace and are bound to have multiple meanings in many contexts, but The EFF is quite a prominent and well-established use in the computing/software arena.)

buro9 · on Feb 17, 2024

Isn't the effort relative to what you're currently running?

If the micro version you're running is 100 versions behind, is it still expected to be micro effort?

medstrom · on Feb 18, 2024

People aren't stupid. Yes, if you skip 100 micro versions, it may not be so micro, but people can do arithmetic themselves.

hervem · on Feb 19, 2024

I read effort and versioning, got some PTSD from Scrum remembering how team (max out to 6 p.) never succeed to have an effort scale.

Am4TIfIsER0ppos · on Feb 18, 2024

How much bigger should python version 3 have been then?