Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think this is my favorite part of the LLM hype train: the butterfly effect of dependence on an undependable stochastic system propagates errors up the chain until the whole system is worthless.

"I think it got 98% of the information correct..." how do you know how much is correct without doing the whole thing properly yourself?

The two options are:

- Do the whole thing yourself to validate

- Skim 40% of it, 'seems right to me', accept the slop and send it off to the next sucker to plug into his agent.

I think the funny part is that humans are not exempt from similar mistakes, but a human making those mistakes again and again would get fired. Meanwhile an agent that you accept to get only 98% of things right is meeting expectations.



This depends on the type of work being done. Sometimes the cost of verification is much lower than the cost of doing the work, sometimes it's about the same, and sometimes it's much more. Here's some recent discussion [0]

[0] https://www.jasonwei.net/blog/asymmetry-of-verification-and-...


> Meanwhile an agent that you accept to get only 98% of things right is meeting expectations.

Well yeah, because the agent is so much cheaper and faster than a human that you can eat the cost of the mistakes and everything that comes with them and still come out way ahead. No, of course that doesn't work in aircraft manufacturing or medicine or coding or many other scenarios that get tossed around on HN, but it does work in a lot of others.


Definitely would work in coding. Most software companies can only dream of a 2% defect rate. Reality is probably closer to 98%, which is why we have so much organisational overhead around finding and fixing human error in software.


How does a software product with 98% defect rate look like? Even 2% seems like a lot. Like one in 50 interactions fail, or 1 in 50 data writes produce data corruption.


> how do you know how much is correct

Because it's a budget. Verifying them is _much_ cheaper than finding all the entries in a giant PDF in the first place.

> the butterfly effect of dependence on an undependable stochastic system

We're using stochastic systems for a long time. We know just fine how to deal with them.

> Meanwhile an agent that you accept to get only 98% of things right is meeting expectations.

There are very few tasks humans complete at a 98% success rate either. If you think "build spreadsheet from PDF" comes anywhere close to that, you've never done that task. We're barely able to recognize objects in their default orientation at a 98% success rate. (And in many cases, deep networks outperform humans at object recognition)

The task of engineering has always been to manage error rates and risk, not to achieve perfection. "butterfly effect" is a cheap rhetorical distraction, not a criticism.


There are in fact lots of tasks people complete immediately at 99.99% success rate at first iteration or 99.999% after self and peer checking work

Perhaps importantly checking is a continual process and errors are identified as they are made and corrected whilst in context instead of being identified later by someone completely devoid of any context a task humans are notably bad at.

Lastly it's important to note the difference between a overarching task containing many sub tasks and the sub tasks.

Something which fails at a sub task comprising 10 sub tasks 2% of the time per task has a miserable 18% failure rate at the overarching task. By 20 it's failed at 1 in 3 attempts worse a failing human knows they don't know the answer the failing AI produces not only wrong answers but convincing lies

Failure to distinguish between human failure and AI failure in nature or degree of errors is a failure of analysis.


> There are in fact lots of tasks people complete immediately at 99.99% success rate at first iteration or 99.999% after self and peer checking work

This is so absurd that I wonder if you're telling? Humans don't even have a 99.99% success rate in breathing, let alone any cognitive tasks.


> Humans don't even have a 99.99% success rate in breathing

Will you please elaborate a little on this?


Humans cough or otherwise have to clear their airways about 1 in every 1,000 breaths, which is a 99.9% success rate.


Thank you for following up


That’s quite good given the complexity and fragility of the system and the chaotic nature of the environment.


> I think the funny part is that humans are not exempt from similar mistakes, but a human making those mistakes again and again would get fired. Meanwhile an agent that you accept to get only 98% of things right is meeting expectations.

My rule is that if you submit code/whatever and it has problems you are responsible for them no matter how you "wrote" it. Put another way "The LLM made a mistake" is not a valid excuse nor is "That's what the LLM spit out" a valid response to "why did you write this code this way?".

LLMs are tools, tools used by humans. The human kicking off an agent, or rather submitting the final work, is still on the hook for what they submit.


"a human making those mistakes again and again would get fired"

You must be really desperate for anti-AI arguments if this is the one you're going with. Employees make mistakes all day every day and they don't get fired. Companies don't give a shit as long as the cost of the mistakes is less than the cost of hiring someone new.


I wonder if you can establish some kind of confidence interval by passing data through a model x number of times. I guess it mostly depends on subjective/objective correctness as well as correctness within a certain context that you may not know if the model knows about or not. Either way sounds like more corporate drudgery.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: