How do most SPAs handle breaking API schema changes?

jrockway · on June 19, 2019

I use grpc-web and protocol buffers.

Renaming fields doesn't matter -- if the client and server disagree about the name of a key, it doesn't matter, because the transport layer uses the tag number, not the name. (Compare this to JSON, where changing the name does break clients.)

Adding fields also doesn't matter, but this is where it starts to get tricky. If a client sends a request without a field that the server expects (i.e., an old version of the client), the message will parse OK, but the server could still say "hey actually that's required, buh bye". If you do this, you lose the backwards compatibility. So don't do that.

Removing fields is something you can basically never do if you want compatibility. You can rename them to deprecated_whatever, though, and see what code still uses them by the fact that they no longer compile. (Binaries using that field can still exist and will continue to work, of course. But at least you can have a transition period where people writing new code will think "hmm, this is probably going away" and won't depend on that field.)

(There are also some additional mechanical details that the protocol buffer documentation talks about. Maybe an int32 isn't big enough so you want an int64. Old clients can still talk to a server that has changed the type of an int32 field to an int64, but eventually it's going to lose data because it didn't allocate enough storage to manipulate an actual 64-bit value. But it does give you time when you think "a year from now this will be bigger than 2^32". You can change the definition today and eventually update all the clients.)

I think with care and the occasional update of a client, though, you can pretty easily keep things compatible forever. This is wayyyyy easier with SPAs than any other sort of API consumer, because you have control over updating the client.

Often you are adding new features, which is the easiest case. You add a new field or RPC, and just start using it.

You can usually structure a change in semantics as a new feature, which makes the cases for which protobuffers excel even more common in practice. For example, say you have clients that depend on the ordering of results from a Lookup() call. You think that that sort is unnecessary and slow, and you want to change the semantics without breaking clients that depend on the ordering. You can just add a new RPC, FastLookup(), and start using that. Clients that use the new RPC will be faster, but old clients will continue to work using the old method. You can update all those (check your monitoring, you probably have a grpc_server_handled_total metric for every method), and after everything's updated, you can safely remove the code that implements Lookup() (either delete it entirely, or to really do it right, return codes.Unimplemented).

I think if you aim for incremental progress, it's pretty easy to achieve with the right tools. It's harder, but possible, even for public APIs where you can't update the client. But where you can update the client and all you have to worry about is browser cache? Easiest possible case ;)

tdhoot · on June 19, 2019

You can also remove a field by reserving the tag number and optionally the name. New binaries will fail to compile but previously persisted values can still be read back by old binaries. Also ensures no one uses the same tag number later and reads bad data.

H8crilA · on June 19, 2019

Indeed, this over 30 years old problem is easily fixable by following guidelines on protocol buffers. Or any other similar technology of this sort, like ASN.1 if you like antics. Protos are just quite popular and therefore a safe commodity bet.

jchw · on June 19, 2019

Ideally, version your APIs, though for some internal APIs you definitely want to be able to make changes without bumping versions constantly.

Be backwards and forward compatible as much as possible. Rolling deployments and other factors virtually guarantee mismatched versions in both directions. Many things, like adding new fields or removing old ones, can be staged in such a way that nothing breaks. Protobuf/gRPC offers some form of backwards and forwards compatibility (at the data model layer, of course,) if you adhere to certain basic invariants.

Almost any change can be staged over time, how long you want to keep compatibility between versions is up to you.

Oh, probably most important: keep clear data model separations wherever you can. Between storage and API, and API and in-memory state management. It’s a lot of work, but it pays dividends.

dkarl · on June 19, 2019

Versioning your APIs gives you a built-in metric for how often you make breaking changes. Not everyone likes seeing this, but if you can enforce it, I think it creates a psychological incentive to exercise discipline about API changes.

swalsh · on June 19, 2019

Not sure if this is standard, but what I usually do is bump the version only on a breaking change. So if I add a field, nothing new happens. Remove a field, or fundumentally change the output, then I bump it up.

ellyagg · on June 19, 2019

Would you care how often you make breaking changes on your API if you control the client and it's your only consumer?

chris11 · on June 20, 2019

Yes, if especially if "you" is a company or engineering department. It can be difficult trying to figure out how to best deprecate a micro-service endpoint if you aren't quite sure what other services are using it and if logging is lacking.

Kuraj · on June 19, 2019

> A number of responses say "have the client detect that it's running an old version and force reload." That works, but it's pretty annoying UX.

Is it though? Of all reasons to force refresh, this sounds reasonable to me as it won't even happen that often.

Heck, I would want to refresh if I knew a new version was available.

michaelt · on June 19, 2019

  Is it though?

Depends if you're triggering it six times a year or six times a day. Some companies pride themselves on how often they deploy code to production [1].

And whether your SPA is something like Google Docs where a user could plausibly have the same document open for a week or more.

[1] https://blog.newrelic.com/technology/data-culture-survey-res...

ollyculverhouse · on June 19, 2019

Is it common to be releasing breaking API schema changes 6 times a day?

Andrex · on June 19, 2019

Intentionally? Probably not.

SilasX · on June 19, 2019

No, but it's common to underestimate the cost imposed on clients how avoidable a breaking change is.

ihuman · on June 19, 2019

I'm not sure if that's a good example. Google Docs automatically saves the document when there's a change, so you won't loose any work if you refresh.

strombofulous · on June 19, 2019

I may be misremembering but I actually remember GDocs telling me to reload once, so presumably they have some way of forcing reloading if they need it (I think it let me keep working locally until I reloaded, then all my stuff would be saved and synced like normal).

tuananh · on June 19, 2019

make it silent? like chrome update?

silent download and auto install on next startup

chrisabrams · on June 19, 2019

Even Facebook does this when you have left it open in a tab for too long.

[edit]: spellcheck failed.

itslennysfault · on June 19, 2019

> Even Facebook

Like they're the example we should look to. They do all sorts of hacky stuff.

OldSchoolJohnny · on June 19, 2019

I lol'd, Facebook has a terrible UI on every platform.

duxup · on June 20, 2019

They do lots of stuff, but they don't seem to break their user's usage due to reloading if you leave a window open forever.

mkolodny · on June 19, 2019

The simplest answer is "don't make breaking API schema changes".

For a more complete answer, this is a great blog post from someone at Stripe about API versioning: https://stripe.com/en-ca/blog/api-versioning

boronine · on June 19, 2019

When you need to make a backward incompatible change, do it in three steps:

1. Add new stuff to API in a backwards compatible manner (without removing or changing old stuff). When you need to add fields it is usually safe to do it within existing functions. When you need to modify fields, I recommend simply copying and pasting your API function and exposing the new one with a number suffix, e.g. get_messages_2

2. Update the client code to use new versions of functions and to stop using old versions.

3. Once the new client is deployed, wait a while longer and then remove old versions of functions like get_messages_1

You can also use a single global version which will allow your client to detect when it has gone stale and reload.

bradstewart · on June 19, 2019

What does this have to do with SPAs specifically? Any of the usual suspects for versioning APIs (URL paths, headers, etc) should suffice, right? The SPA is just another client.

_bxg1 · on June 19, 2019

SPAs as opposed to mostly-static web pages. I think all "rich clients" are implicitly included, although the "refresh" story does change for native ones.

beiller · on June 19, 2019

We use API versioning. Works like a charm! If you don't currently have API versioning, just put the breaking changes behind /v2/<endpoint> and start versioning.

ljm · on June 19, 2019

I guess the hard thing is that you lose versioning granularity at that level. If you made a mistake with your v2 rollout and the fix is breaking, do you bump to v3 or append the correct field? Do you leave the bug in the response still so that users can decide to depend on it? You can’t delete. What would the user thing about bumping from v3 to v12 in a really short amount of time as the API is in flux?

Stripe’s date versioning is the best implementation I’ve seen yet. The worst I’ve seen is using custom MIME types. Version number in the URL is naive but intuitive.

beiller · on June 27, 2019

I've never come across your scenario. I would say if you need to fix v2 with a breaking change, then v2 must be broken to begin with. All that being said, going to v3 is not a problem either. I have a few APIs where we went up to v7 after 2 years of operation without issues.

033803throwaway · on June 19, 2019

I don't know if most SPA frameworks offer this sort of functionality, but with intercooler.js you can send response headers to trigger client-side events (or, if you are feeling hacky, to evaluate raw javascript.)

This can be used to trigger a full browser refresh when the application topology had changed dramatically.

aaronharnly · on June 19, 2019

How do you even handle SPA updates themselves? Apps may reference static resources (JavaScript, CSS, images) which get updated with a new version. Those resources could be clobbered by a deploy, or be uglified, or be versioned. Do you keep the old static resources in place forever, or for a period of time? Or force a reload? Or do you just let clients break and make users reload?

jrockway · on June 19, 2019

Yeah, the tooling around this is pretty bad. For example, you might release a container that has index.html and main.abc1234.js. index.html has to be parsed before the JS bundle is requested. If you do a release in that intermediate time, the javascript will 404 and your site won't load because in the new container, the only bundle that exists is main.def2345.js.

I think people additionally assume this never happens, because their error reporting code lives in the javascript bundle and they never get error reports ;)

The correct solution is probably to keep a few old versions of the Javascript bundle around, so that in-flight requests succeed even as you update the container hosting the app. I do not know of a tool that does this, but the edge case I describe above worries me, so I might write one someday.

aaronharnly · on June 20, 2019

Yeah, and if you do dynamic loading of resources (images, templates, etc), I think the window of 404ing is much longer than just the index.html parse time — essentially, the span of a user session...

moltar · on June 20, 2019

Edge caching is a solution.

jrockway · on June 20, 2019

Yeah, I think that's a reasonable solution. It doesn't guarantee 100% accuracy, but it increases the chance that something will load.

projectileboy · on June 19, 2019

This doesn’t magically cure everything, but switching from REST to GraphQL can help make a lot of versioning-type issues disappear.

undoware · on June 19, 2019

absolutely true in my experience also

undoware · on June 19, 2019

graphql