I'm wondering if introducing a system message like "convert the resulting json to yaml and return the yaml only" would adversely affect the optimization done for these models. The reason is that yaml uses significantly fewer tokens compared to json. For the output, where data type specification or adding comments may not be necessary, this could be beneficial. From my understanding, specifying functions in json now uses fewer tokens, but I believe the response still consumes the usual amount of tokens.
I think one should not underestimate the impact on downstream performance the output format can have. From a modelling perspective it is unclear whether asking/fine-tuning the model to generate JSON (or YAML) output is really lossless with respect to the raw reasoning powers of the model (e.g. it may perform worse on tasks when asked/trained to always respond in JSON).
I am sure they ran tests on this internally, but I wonder what the concrete effects are, especially comparing different output formats like JSON, YAML, different function calling conventions and/or forms of tool discovery.
That's what I'm doing. I ask ChatGPT to return inline yaml (no wasting tokens on line breaks), then I parse the yaml output into JSON once I receive it. A bit awkward but it cuts costs in half.