Rather than converting the entirety of structured data into unstructured text, w...

Rather than converting the entirety of structured data into unstructured text, we opt to provide the language model with sample rows and the database schema. Utilizing this approach, the LLM is then tasked with generating a SQL query to address the posed question. An alternative strategy, as you've suggested, involves transforming all structured data into textual format and subsequently training an LLM to directly respond to queries. This method, however, presents several challenges:

Linearizing structured data into text risks losing the inherent organization by columns and rows, potentially obfuscating the structured information.

The amount of information encapsulated within large tables poses significant challenges for training or fine-tuning models, attributable to the extensive number of tokens required, which may prove prohibitive for many due to resource constraints.