You Are Doing JSON APIs Wrong (“foo”:{}, not “type”: “foo”)

stickfigure · on May 4, 2022

This is premature optimization. A very small percentage of APIs return so much data that you need to worry about stream parsing. The vast majority of APIs in the world, including probably yours, returns structures that are easy to manage in memory.

If you render this `{"MyMessageType":{...}}` structure, people who live in statically typed languages will hate you. At best you've created a union type with a truly vast number of possible cases. In practice you've forced this to be read as `Map<String, JsonNode>` and dynamically interpreted.

Unless your problem domain routinely streams gigabytes of data across an API, don't do this. And if it does stream gigabytes of data across an API, figure out how to chunk the data and still don't do this.

kevincox · on May 4, 2022

I don't know about other languages but this is perfectly fine in Rust. It is actually the default formatting for enums (unions) and does streaming parsing.

    #[derive(Debug,serde::Deserialize)]
    struct MyMessage {
        data: i32,
    }
    
    #[derive(Debug,serde::Deserialize)]
    enum Example {
        MyMessageType(MyMessage)
    }
    
    fn main() {
        dbg!(serde_json::from_str::<Example>(r#"{"MyMessageType":{"data":5}}"#));
    }

Running:

    [src/main.rs:12] serde_json::from_str::<Example>(r#"{"MyMessageType":{"data":5}}"#) = Ok(
        MyMessageType(
            MyMessage {
                data: 5,
            },
        ),
    )

https://play.rust-lang.org/?version=stable&mode=debug&editio...

I'm not sure why you are forced to read it into a Map<String, JsonNode> unless you are doing your parsing without type information in which case any solution is going to require a map.

Spivak · on May 4, 2022

How does { "type": "huge union of types" } help you here? You still have to read the whole thing as a generic Map just to find the type key and then dynamically unmarshall to your real object.

Like what even is the return type for parseJSONBlobToOneOfFiftyMessageTypes(String json)? Guess you could return a generic APIMessage and a type object so the caller can cast it but now you're back to playing dynamic games.

agency · on May 4, 2022

Coming from TypeScript, having an explicit discriminator field (like "type") is a lot easier to deal with[1] (which to be fair is compatible with the last example given in the OP where they have a "type" field and a generic "data" payload field).

[1] https://www.typescriptlang.org/docs/handbook/2/narrowing.htm...

TheCoelacanth · on May 5, 2022

I don't think it's any easier. Narrowing works the same way with the in operator. If A, B and C each have a single property with the same name:

  function f(x: A | B | C) {
    if ('A' in x) {
      // x is narrowed to A
    } else {
      // x is narrowed to B | C
    }
  }

stickfigure · on May 4, 2022

In Javaland, Jackson and GSON both have easy ways of deserializing polymorphically based on a type property.

FeepingCreature · on May 4, 2022

I've ran into business-relevant slowdowns due to this issue. I live in a statically typed language. Your "at best" is actually very straightforward. Also, sum types exist.

The problem is that by the time you stream gigabytes of data, if you have done it the wrong way, you now have to add a new API revision for the right format, update your whole ecosystem - if the protocol is even in your control. If you did (or specced) it right from the start, you'd only have to adjust the clients at that point, and it would have been just as easy up until then.

selfhoster11 · on May 4, 2022

1. I live in a statically typed language, and honestly what's purposed in the article is not such a burden. It'd much prefer to receive the object type data via that Content-Type header or equivalent, but this will do as well.

2. If you're sending me gigabytes of JSON, something has gone horribly wrong. Split it up, or send me the result in a more suitable format.

vba616 · on May 4, 2022

>If you're sending me gigabytes of JSON, something has gone horribly wrong. Split it up, or send me the result in a more suitable format.

Gigabytes of XML, coming right up!

selfhoster11 · on May 5, 2022

cue incoherent, horrified scream

Zababa · on May 4, 2022

How is this premature optimization? Since it's equivalent to the alternative, it should be known that this way has an advantage.

stickfigure · on May 4, 2022

Adding a ton of structural noise is not "equivalent". This pattern will be repeated throughout your hierarchy, and your clients will have to deal with this:

    GET /assets/123
    {
        "Asset": {
            "owner": {
                "Person": {
                    "name": "Bob",
                    "contact": {
                        "Address": {
                            "street1": "blah",
                            ...
                        }
                    }
                }
            }
        }
    }

On the other hand, the traditional approach:

    GET /assets/123
    {
        "owner": {
            "type": "Person",
            "name": "Bob",
            "contact": {
                "type": "Address",
                "street1": "blah",
                ...
                }
            }
        }
    }

This is much more sane to navigate. And when you end up making some previously-static node polymorphic (as happens), the structure does not change.

FeepingCreature · on May 5, 2022

I think making a previously-static node polymorphic in a way that doesn't drop any required field is actually quite rare. But also, I just think the first one is better.

stickfigure · on May 5, 2022

It's much easier to migrate adding/dropping fields than to make structural changes. Adding/dropping fields is the easiest form of migration.

I honestly have a hard time believing that you prefer:

   asset.Asset.owner.Person.contact.Address.street1

to:

    asset.owner.contact.street1

If so, I think you should really announce this first and foremost in your article so that everyone can take a proper measure of your sanity :-)

FeepingCreature · on May 5, 2022

I mean, I don't actually believe you that all those levels are polymorphic. :-)

IME, we have one, at most two polymorphic fields per message. And for an example chosen deliberately to be annoying, I still like the deeper tree better- at least it advertises that something unusual is going on.

__ryan__ · on May 4, 2022

While it's easier to use discriminated unions with some "type" field (easier to use switch/case) here's the reality:

1. There's many ways for the compiler to narrow a union type [0]

2. There's a better solution than depending on "easy" narrowing for excessively large union types: type guards, callback registry, etc.

[0]: https://www.typescriptlang.org/play?#code/JYOwLgpgTgZghgYwgA...

eatonphil · on May 4, 2022

Pretty sure TypeScript discriminated unions don't work unless you have a type field you can switch on.

hyuuu · on May 4, 2022

why cant you have:

type Message { contentA?: ContentAStructure, contentB?: ContentBStructure, }

on read, you are already accessing the property data as typed right?

matt_kantor · on May 4, 2022

That's not a union. Both `{}` and `{ contentA: a, contentB: b }` inhabit that type, so it doesn't represent "a value of type `Message` is either a value of `ContentAStructure` or a value of `ContentBStructure`".

eatonphil · on May 4, 2022

You can but that's not type safe in TypeScript.

henrydark · on May 4, 2022

I came in knowing I'd be against it, and came out convinced. The argument is just solid: it's easier to make things correct as well as faster this way.

hyperhello · on May 4, 2022

It sounds like the point is to type members before their data in the stream. But that can’t be a huge drain on serializers. JSON isn’t supposed to be incredibly performant, it’s supposed to be human readable. And with multi core I would expect serializers to read ahead anyway.

There was a thing called Reverse Hungarian that put an abbreviation for the type in the variable name. That had some value to work on very small screens without much scrolling.

FeepingCreature · on May 4, 2022

It's even more human-readable this way though (IMO), because it saves you from having to pull a type field from somewhere of a soup of a log message. With the key-identified format, the type is guaranteed to be directly before the data.

nly · on May 4, 2022

> The type of every JSON object field should be uniquely determined by its field name.

Fields of variant/oneof/union type are really useful, so I don't agree with this. The problem regarding JSON serialization of type-tagged fields and streaming parsers is very well observed though.

I suppose one nice thing about adding a sibling "type" key, is it allows you to have the option to make a non-variant field a variant later on (by allowing the type field to have a default if it's missing).

FeepingCreature · on May 4, 2022

You can do union types with the same mechanism: `{"foo": FooTypedValue}` or `{"bar": BarTypedValue}`. No variant, admittedly, but there's no making open-ended JSON objects fast no matter what you do. The problematic thing with `"type"` is that it forces the parser to treat your entire message as variant.

nly · on May 4, 2022

That's not a variant, it's just 2 optional fields: one called "foo" and one called "bar". A validating parser then has to know that they are semantically linked. Admittedly this is what protocol buffers does in the binary wire format, because oneof's are just a frontend hack (iirc the latest value decoded gets kept), but it's not the same.

armchairhacker · on May 4, 2022

this is how I usually write JSON. { “type”: foo, … } is more verbose and can lead to issues if you forget to check the type, and moreover it makes it easier to generate some complex types.

jurschreuder · on May 4, 2022

I may be weird but I do it like this:

{ "type": "a", "version": "1", "payloadJson": "{\"a\": 10}" }

arinlen · on May 4, 2022

> { "type": "a", "version": "1", "payloadJson": "{\"a\": 10}" }

What's wrong with REST and leaving stuff like versioning and resource type to HATEOAS?

remram · on May 5, 2022

It sounds like you are reinventing HTTP headers and trying to put them in the body for whatever reason.

I've also seen this before, but I can't say I didn't laugh:

    HTTP/1.1 200 Ok
    Content-type: application/json

    {"status": 404, "type": "xml", "body": "<entry><name>...</name><author>...</author></entry>"}

Otek · on May 4, 2022

That is kinda weird, why would you store json in a json as a string?

selfhoster11 · on May 4, 2022

It is weird. There aren't many reasons that I can think of for encoding things this way.

__ryan__ · on May 4, 2022

benibela · on May 5, 2022

I am doing ["foo", {} ]