Hacker News new | past | comments | ask | show | jobs | submit login

LLMs clearly struggle when presented with JSON, especially large amounts of it.

There's nothing stopping your endpoints from returning data in some other format. LLMs actually seem to excel with XML for instance. But you could just use a template to define some narrative text.




I'm consistently surprised that people don't use XML for LLMs as the default given XML comes with built-in semantic context. Convert the XML to JSON output deterministically when you need to feed it to other pipelines.


Any reason for this for my own learning? Was XML more prevalent during training? Something better about XML that makes it easier for the LLM to work with?

XML seems more text heavy, more tokens. However, maybe more context helps?


It's in the official OpenAI prompting guidelines: https://cookbook.openai.com/examples/gpt4-1_prompting_guide#...

But it's also evident for anyone who has used these models. It's also not unique to OpenAI, this bias is prevalent in every model I've ever tested from GPT 3 to the latest offerings from every single frontier model provider.

As to why I would guess it's because XML bakes semantic meaning into the tags it uses so it's easier for the model to understand the structure of the data. <employee>...</employee> is a lot easier to understand than { "employee": { ... }}.

I would guess that the models are largely ignoring the angular brackets and just focusing on the words which have unique tokens and thus are easier to pair up than the curly braces that are the same throughout JSON. Just speculation on my part though.

And this only applies to the input. Earlier models struggled to reliably output JSON so they've been both fine-tuned and wrapped in specific formatters that reliably force clean JSON outputs.


I've seen the suggestion it's because it's been trained on a lot of HTML, but the GPT docs suggest using markdown as a default choice, which is relatively less common.


We've been using Markdown tables to return data to the LLM with some success




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: