It's staggering to me that people keep making these "rich" data formats without sum types. At least to me, the "ors" are just as important as the "ands" in domain modeling. Apart from that, while you can always sort of fake it with a bunch of optional fields I believe that you kind of need a native encoding to a tagged union if you want to avoid bloating your messages.
The ion data model doesn't describe a schema or type system. It's a data structure where values are of a known type. In the binary format values are preceded by a type id, in the text format the syntax declares the type - "" for string, {} for struct. The data model doesn't declare what types a value could have, only the type it does have.
Doesn't it trivially have "sum types" since it's just arbitrary self-describing data? i.e. nobody is stopping you from passing around objects in such a way:
{a:1}
{a:{b:2}}
{a:4}
{a:{b:4}}
There's no static type layer over top of this, so it's inherently up to interpretation and whatever type system you want to use to describe this data, to be able to express that the values of `a` can be of type `number | {b: number}`
Yeah, that's the problem.
I mean, hey, why json? We could just use unstructured plaintext for everything and now we are free to do everything. But obviously that has its own drawbacks.
Having built-in support for sumtypes means better and more ergonomic support from libraries, it means there is one standard and not different ways to encode things and it also means better performance and tooling.
The point is that there's no reason to single out sumtypes here. Insofar as ions/json has support for arrays/objects/strings/numbers, it has exactly the same support for sumtypes, as in the example I showed above. Here is a list of "sumtype" `string | number | object`:
In the same sense "1e-12" is not a number, it's a string. Yes, it's a string that encodes a number in a certain notion, but for alle the tooling, the IDE, the libraries, etc. it will stay a string.
Sum types =/= union types. Sum types are also called 'tagged' or 'discriminable' unions because they have some way to discriminate between them. That is, if you have an element a of type A, a is not part of the sum type A + B because it's missing a tag.
[5,"hello",3] has the type list (int ∪ string), not list (int + string). You can emulate the latter by manually adding a tag, but native support is much preferable.
I know the differences between untagged and tagged unions, I'm trying to provide a minimal example without distracting details but sure we can talk about tagged unions. Here is a list of tagged unions, so I once again point out that sum types are "supported" in JSON/ions just as much as any other data type:
There is no such thing in JSON or Ions as defining this "X" schema somewhere. So I may as well say that your [A,B,...] is a list[Any].
Now, I wouldn't actually call it a list of any, I would say you proved my point for me. Your example is functionally the same as mine. I would give this example:
`[A, B, ...]`
and say that that is a list of sum types. You may say "no no no! Only now is it a list of sum types!":
`data X = A | B
[A, B, ...]`
But my point is that there is no JSON/Ion equivalent of your `data X = A | B`. Everyone in this comment tree is confusing the data itself with out-of-band schema over that data. "Sumtype" is nothing more than a fiction, or a schema. Saying that JSON/Ions don't support sumtypes is like saying JSON doesn't support "NonNegativeInteger" type. Sure it does! Here are some: 1, 2, 3, 10. What tooling or type system you use outside of the data itself to enforce constraints on the data types is orthogonal to the data format itself.
> But my point is that there is no JSON/Ion equivalent of your `data X = A | B
No one disagrees - it's just that we complain about this. We _want_ to have such an equivalent.
> Saying that JSON/Ions don't support sumtypes is like saying JSON doesn't support "NonNegativeInteger" type.
Correct. But your conclusion is wrong. You seem to assume that no one has a problem with the fact that JSON doesn't support a "NonNegativeInteger" type. But I at least would happily use a format that explicitly supports that.
I mean, look at ION. Json doesn't support the concept of (restricted) integers, but ION extends JSON and offers this type. That's great, because it means if a library reads an integer field, it can map it to an integer and knows that there are constraints.
This is a _very_ relevant issue. Many json libraries in the past have had bugs or could be ddos-ed by feeding them json with large numbers, since the json spec does not constrain the size of numbers.
In that sense, ION could have _also_ added support for "NonNegativeInteger" or sumtypes, or other specific types, but they haven't. And since sumtypes are very fundamental, we complain about it more than we would complain about the lack of "NonNegativeInteger".
data interchange formats try to encode as little backwards incompatible information as possible. in this case, it would be the restriction that something is a sum type when it could have multiple fields set in the future. another example is protobuf moving to all fields being optional by default.
as for the wire format, a variant struct where you've only instantiated a single field will encode down to just about the minimum amount of information required.
Avro went the opposite way to most and just makes the concept of an optional field implementable via a union with null
Non union fields can even be upgraded to unions later
Personally I find the protobufs "everything is optional!" Behaviour fucking insane and awful to deal with, but it is true to the semantics of its underlying wire format.
One can always choose not to use (native) sumtypes if they are interested in extreme performance or compatibility.
But logically speaking, it is _good_ that it's a restriction that a sumtype can't just turn into a multiple-fields type. Because while my software (as the consumer) might still be able to deserialize it, the assumption that only one field is set would be broken and my logic would now potentially broken. Much better if that happens at deserialization time then later one when I find out that my data is incorrect/corrupt.
Well, there are already sumtypes, just only specific builtin ones, not custom ones. E.g. booleans are sumtypes (true | false). Everything else that is nullable is also a sumtype (e.g. number | null).
I think it should be pretty obvious how these are helpful and why they are needed no?
Protobuf supports sum types in the higher-level generated descriptors and languages -- on the wire they're just encoded as, well... oneof a number of possible options.
Avro had unions in version 1.0 [0], which is from 2012.
Capnproto had unions back in 2013 [1]. That's from the v0.1 days, or maybe even earlier.
Protobuf has had oneof support for about 7 years. They were added in version 2.6.0, from 2014-08-15 [2]. That's still 6 years after the initial public release in 2008, though, so this is maybe what you were thinking of? I don't know too many people who were using protobuf in those days outside of Google, though.
And yes, I definitely am primarily thinking of protobuf, as I struggled with this back with version 2.5. I had the (apparently mistakenly) impression that Avro and Cap'n Proto (which I think actually first came out in this timeframe) were about on par.