So, if I understand the approach correctly: we're essentially doing very advanced feature engineering with LLMs. We find that direct classification by LLMs performs worse than LLM feature engineering followed by decision trees. Am I right?
The finding surprises me. I would expect modern LLMs to be powerful enough to do well at the task. Given how much the data is processed before the decision trees, I wouldn't expect decision trees to add much. I can see value in this approach if you're unable to optimize the LLM. But, if you can, I think end-to-end training with a pre-trained LLM is likely to work better.
Perhaps the reason that this approach works well is that, while the LLM gives you good general-purpose language processing, the decision tree learns about the specific dataset. And that combination is more powerful than either component.
It’s the same reason LLMs don’t perform well on tabular data. (They can do fine but usually not was well as other models)
Performing feature engineering with LLMs and then storing the embeddings in a vector database also allows you to reuse the embeddings for multiple tasks (eg clustering, nearest neighbor).
Generally no one uses plain decision trees since random forest or gradient boosted trees perform better and are more robust.
The finding surprises me. I would expect modern LLMs to be powerful enough to do well at the task. Given how much the data is processed before the decision trees, I wouldn't expect decision trees to add much. I can see value in this approach if you're unable to optimize the LLM. But, if you can, I think end-to-end training with a pre-trained LLM is likely to work better.