This is premature optimization. A very small percentage of APIs return so much data that you need to worry about stream parsing. The vast majority of APIs in the world, including probably yours, returns structures that are easy to manage in memory.
If you render this `{"MyMessageType":{...}}` structure, people who live in statically typed languages will hate you. At best you've created a union type with a truly vast number of possible cases. In practice you've forced this to be read as `Map<String, JsonNode>` and dynamically interpreted.
Unless your problem domain routinely streams gigabytes of data across an API, don't do this. And if it does stream gigabytes of data across an API, figure out how to chunk the data and still don't do this.
I don't know about other languages but this is perfectly fine in Rust. It is actually the default formatting for enums (unions) and does streaming parsing.
I'm not sure why you are forced to read it into a Map<String, JsonNode> unless you are doing your parsing without type information in which case any solution is going to require a map.
How does { "type": "huge union of types" } help you here? You still have to read the whole thing as a generic Map just to find the type key and then dynamically unmarshall to your real object.
Like what even is the return type for parseJSONBlobToOneOfFiftyMessageTypes(String json)? Guess you could return a generic APIMessage and a type object so the caller can cast it but now you're back to playing dynamic games.
Coming from TypeScript, having an explicit discriminator field (like "type") is a lot easier to deal with[1] (which to be fair is compatible with the last example given in the OP where they have a "type" field and a generic "data" payload field).
I've ran into business-relevant slowdowns due to this issue. I live in a statically typed language. Your "at best" is actually very straightforward. Also, sum types exist.
The problem is that by the time you stream gigabytes of data, if you have done it the wrong way, you now have to add a new API revision for the right format, update your whole ecosystem - if the protocol is even in your control. If you did (or specced) it right from the start, you'd only have to adjust the clients at that point, and it would have been just as easy up until then.
1. I live in a statically typed language, and honestly what's purposed in the article is not such a burden. It'd much prefer to receive the object type data via that Content-Type header or equivalent, but this will do as well.
2. If you're sending me gigabytes of JSON, something has gone horribly wrong. Split it up, or send me the result in a more suitable format.
Adding a ton of structural noise is not "equivalent". This pattern will be repeated throughout your hierarchy, and your clients will have to deal with this:
I think making a previously-static node polymorphic in a way that doesn't drop any required field is actually quite rare. But also, I just think the first one is better.
I mean, I don't actually believe you that all those levels are polymorphic. :-)
IME, we have one, at most two polymorphic fields per message. And for an example chosen deliberately to be annoying, I still like the deeper tree better- at least it advertises that something unusual is going on.
That's not a union. Both `{}` and `{ contentA: a, contentB: b }` inhabit that type, so it doesn't represent "a value of type `Message` is either a value of `ContentAStructure` or a value of `ContentBStructure`".
I came in knowing I'd be against it, and came out convinced.
The argument is just solid: it's easier to make things correct as well as faster this way.
It sounds like the point is to type members before their data in the stream. But that can’t be a huge drain on serializers. JSON isn’t supposed to be incredibly performant, it’s supposed to be human readable. And with multi core I would expect serializers to read ahead anyway.
There was a thing called Reverse Hungarian that put an abbreviation for the type in the variable name. That had some value to work on very small screens without much scrolling.
It's even more human-readable this way though (IMO), because it saves you from having to pull a type field from somewhere of a soup of a log message. With the key-identified format, the type is guaranteed to be directly before the data.
> The type of every JSON object field should be uniquely determined by its field name.
Fields of variant/oneof/union type are really useful, so I don't agree with this. The problem regarding JSON serialization of type-tagged fields and streaming parsers is very well observed though.
I suppose one nice thing about adding a sibling "type" key, is it allows you to have the option to make a non-variant field a variant later on (by allowing the type field to have a default if it's missing).
You can do union types with the same mechanism: `{"foo": FooTypedValue}` or `{"bar": BarTypedValue}`. No variant, admittedly, but there's no making open-ended JSON objects fast no matter what you do. The problematic thing with `"type"` is that it forces the parser to treat your entire message as variant.
That's not a variant, it's just 2 optional fields: one called "foo" and one called "bar". A validating parser then has to know that they are semantically linked. Admittedly this is what protocol buffers does in the binary wire format, because oneof's are just a frontend hack (iirc the latest value decoded gets kept), but it's not the same.
this is how I usually write JSON. { “type”: foo, … } is more verbose and can lead to issues if you forget to check the type, and moreover it makes it easier to generate some complex types.
If you render this `{"MyMessageType":{...}}` structure, people who live in statically typed languages will hate you. At best you've created a union type with a truly vast number of possible cases. In practice you've forced this to be read as `Map<String, JsonNode>` and dynamically interpreted.
Unless your problem domain routinely streams gigabytes of data across an API, don't do this. And if it does stream gigabytes of data across an API, figure out how to chunk the data and still don't do this.