Question, why build this when you can use LLMS to extract the data in the most appropriate format to begin with? Isn't this a bit redundant? Perhaps it makes sense in the short term due to cost but in the long run this problem can be solved generically with LLMS.