As someone that's written a whole lot of code parsing both complex XML and JSON, I'd go with a more restrictive JSON format over a more idiomatically correct and elegant (from the data perspective) XML format any day. Complex XML sucks for storing structured data unless it's as restrictive as a JSON document, and then...
The simple use case for XML is always easy, but then it always ends up looking like this:
<step>prepare fruit
<step>prepare <ing variety"bartlet anjou comice">pear slices</ing> from a <ing state="unprepped">pear</ing>
<step>wash</step>
<step>trim
<step>remove stem</step>
<step>peel</step>
</step>
<step>
<step thickness=".25mm">slice</step>
</step>
</step>
<step>...
Plain text is great and all as a display format but it sucks even more than XML to parse as a data format.
You can make JSON that's just as stupid as XML but especially if you have people hand-writing XML, it invites a lot of complexity for a little more expressiveness. If you need to, you can always have flatter XML markup in JSON fields to avoid the large scale recursive structural insanity when parsing.
My gut feeling is that there's something going on here with dueling priorities between (A) the best editing experience with a plain text editor vs. (B) the clearest storage format. This leads to things like "too much inlining" or "too much duplication".
In contrast, imagine relaxing the everything-in-notepad requirement, imagine a renderer that can easily display cross-referenced materials in a readable way. Or a step beyond that, an editor which also gives you "jump to definition" etc.
That change permits a much more internally-consistent XML file, such as one where "materials" and "steps" are separate sections, and any step can references a material that is being used as input or output, with something like <mat_ref id="sliced_uncooked_apples"/> .
I see the value in using xml for simple markup and standardized entities/references but the flexibility makes navigating whole documents more cumbersome. I think that using it inline is a good idea, but above the paragraph level I don’t see the benefit of using it at all. Even in the supremely consistent world of open doc xml, parsing is a bear of a task. For something like this requiring a fraction the complexity, it should either force more internal structure— XML markup in json fields representing ingredients lists, etc— or just decide it’s for presentation only and go with HTML or rtf.
I probably also have a different perspective on both of these topics than most. I’ve dove a lot of automated document work, and also was a chef so I’ve got a more structured, less prosaic approach to recipes.
I was reading through this and caught myself thinking "man, if you want people to read your recipe then just write it", and for that plain text or some minimal markup still works wonders...
That's great for writing recipes for someone to read as recipes, but it's not very useful if you're trying to create a collection of structured data from recipes.
This is one of the golden applications of LLMs. You can see the variety of structured formats proposed in comments, the different use cases, and honestly it seems like a bad idea to privilege any single format. Instead, you as a data consumer can use LLMs to parse common language recipes into the structured format most appropriate to your needs. DAG or linear? JSON or XML? You decide!
If your primary use case is displaying individual recipes that makes good sense. Less-so if you need reliable calculations at a larger scale. For example, if I was making planning software for a catering company, they’d want to know how many cases of onions they need this week for the 9 events with different menus. I don’t trust LLMs for that level of accounting yet. Hopefully soon!
It is not clear why that would be better than an xml-type arrangement like
<step>cut your <ing>apple</ing> into slices<step>
Or even just plain text.