> This seems like a problem you can’t solve generically and you always end up ma...

vlovich123 · on March 9, 2024

I suspect that it’s impossible in the sense that the “possible” space will look like a distributed storage solution and the rest will look similar to graceful handoff of new connections to new version + shutdown of old version after some time (with forceful disconnect of sessions hanging around).

stevan · on March 9, 2024

I give two examples of a stateful upgrade in Erlang/OTP in the motivation, neither rely on distributed storage.

vlovich123 · on March 9, 2024

Unfortunately the documentation for Erlang doesn’t really describe any pros/cons for anything and I’m not an expert in it so I don’t know what the limitations are for the Erlang approach but they certainly must be (e.g. if you have long running sessions and do several upgrades, are you running N versions of the code & eating up RAM because the old sessions aren’t complete?).

As I understand it, Erlang/OTP captures the entire state of the program and it’s a feature of the language and VM to accomplish this. It’s not something you can retrofit into any arbitrary language. For example, your JS app or your Python app or your Rust app won’t be able to do the same easily which means it won’t be robust and it will be error prone. Thus I stand by that there’s no “generic” solution you can bolt onto an arbitrary language.

stevan · on March 9, 2024

> if you have long running sessions and do several upgrades, are you running N versions of the code & eating up RAM because the old sessions aren’t complete?

I believe Erlang supports two versions running along each other. They capped it at two because back when this was developed there wasn't enough RAM. Joe Armstrong gave at least one talk where he says if he'd have liked to support arbitrary number of versions and garbage collect them as old sessions complete.

> Thus I stand by that there’s no “generic” solution you can bolt onto an arbitrary language.

The main point of the post is centered around Barbara Liskov saying "maybe we need languages that are a little bit more complete now". I'm not interested in the limitations of current languages, I'm interested in the future possibilities.

vlovich123 · on March 9, 2024

There’s no free lunch and I’m suggesting the trade offs to support this are not worth it vs simpler approaches of doing a graceful drain & upgrade approach w/ a timeout for long running sessions if those may exist (+ if you have a lot of large state to migrate, it could be insanely long to complete an upgrade). This is because availability will never be 100% anyway in any scenario and this kind of transition can easily fit within your failure budget.

toast0 · on March 9, 2024

> As I understand it, Erlang/OTP captures the entire state of the program and it’s a feature of the language and VM to accomplish this. It’s not something you can retrofit into any arbitrary language. For example, your JS app or your Python app or your Rust app won’t be able to do the same easily which means it won’t be robust and it will be error prone. Thus I stand by that there’s no “generic” solution you can bolt onto an arbitrary language.

I say you can do hotload in any language that supports dlsym/dlopen or eval. I've done it (rather poorly) in Perl and C, and I'm sure others have done it in other languages.

It's a lot nicer in Erlang, so IMHO, if your use case includes long running processes with expensive to construct or transfer state (such as long running sockets), it's worth considering Erlang or something than can do hot loading.

gregors · on March 9, 2024

Don't know if you care this much or not, but figured I'd link this Elixir talk that goes into details regarding hot upgrades.

https://www.youtube.com/watch?v=IeUF48vSxwI

vlovich123 · on March 9, 2024

That’s a great link thanks! It really makes it clear that a) correct state changes aren’t automatically correct (there’s both a manual and automated piece and either can go wrong) b) while the language makes it possible, there’s still a lot of manual work involved & footguns (e.g. if you have a contended resource held while something is being migrated, you’re going to experience degraded availability for other sessions to the point of downtime).