I don’t understand how this can work. Given probabilistic nature of LLMs the more steps you have more chances something goes off. What is good in the dashboard if you cannot be sure it was not partially hallucinated?
> What is good in the dashboard if you cannot be sure it was not partially hallucinated?
A lot of the time the dashboard contents doesn't actually matter anyway, just needs to look pretty...
On a serious note, the systems being built now will eventually be "correct enough most of the time" and that will be good enough (read: cheaper than doing it any other way).
>On a serious note, the systems being built now will eventually be "correct enough most of the time"
I don’t believe this would work. File a “good enough” tax return one year and enjoy hefty fine 5 years later. Or constantly deal with customers not understanding why one amount is in the dashboard and another is in their warehouse.
Probability of error increase rapidly when you start layer one probabilistic component onto another. Four 99% reliable components sequenced one after another have error rate of 4%.
Probabilistic nature means nothing on its own. LLM that can solve your deterministic task will easily assign 100% to the correct answer (or 99%, the noise floor can be truncated with a sampler). If it doesn't do that and your reply is unstable, it cannot solve it confidently. Which happens to all LLMs on a sufficiently complex task, but it's not related to their probabilistic nature.
Of course that still doesn't mean that you should do that. If you want to maximize model's performance, offload as much distracting stuff as possible to the code.