> I'm literally shocked how we can spend a couple decades fantasizing and writin...

mjr00 · 2025-04-19T00:51:12 1745023872

> No one longed for a level of AI where you have to double check everything.

This has basically been why it's a non-starter in a lot of (most?) business applications.

If your dishwasher failed to clean anything 20% of the time, would you rely on it? No, you'd just wash the dishes by hand, because you'd at least have a consistent result.

That's been the result of AI experimentation I've seen: it works ~80% of the time, which sounds great... except there's surprisingly few tasks where a 20% fail rate is acceptable. Even "prompt engineering" your way to a 5% failure/inaccuracy rate is unacceptable for a fully automated solution.

So now we're moving to workflows where AI generates stuff and a human double checks. Or the AI parses human text into a well-defined gRPC method with known behavior. Which can definitely be helpful, but is a far cry from the fantasized AI in sci-fi literature.

dclowd9901 · 2025-04-19T02:01:25 1745028085

It feels a bit like LLMs rely a lot on _us_ to be useful. Which is a big point to the author's article about how companies are trimming off staff for AI.

re-thc · 2025-04-19T02:43:18 1745030598

> how companies are trimming off staff for AI

But they're not. That's just the excuse. The real truth is somewhere along pandemic over hire and bad / unstable economy.

Terr_ · 2025-04-19T03:08:44 1745032124

Also attempts to influence investors/stock-price.

https://newrepublic.com/article/178812/big-tech-loves-lay-of...

dclowd9901 · 2025-04-19T03:38:17 1745033897

We've frozen hiring (despite already being under staffed) and our leadership has largely pointed to advances in AI as being accelerative to the point that we shouldn't need more bodies to be more productive. Granted it's just a personal anecdote but it still affects hundreds of people that otherwise would have been hired by us. What reason would they have to lie about that to us?

Nition · 2025-04-19T01:51:51 1745027511

One type of question that a 20%-failure-rate AI can still be very useful for is ones that are hard to answer but easy to verify.

For example say you have a complex medical problem. It can be difficult to do a direct Internet search that covers the history and symptoms. If you ask AI though, it'll be able to give you some ideas for specific things to search. They might be wrong answers, but now you can easily search specific conditions and check them.

Sort of P vs. NP for questions.

skydhash · 2025-04-19T01:56:53 1745027813

> For example say you have a complex medical problem.

Or you go to a doctor instead of imagining answers.

ianbutler · 2025-04-19T02:28:38 1745029718

You put too much faith in doctors. Pretty much every woman I know has been waived off for issues that turned serious later and even as a guy I have to do above average leg work to get them to care about anything.

satisfice · 2025-04-19T03:45:02 1745034302

Doctors are still better than LLMs, by a lot.

Ancapistani · 2025-04-19T05:09:10 1745039350

All the recent studies I’ve read actually show the opposite - that even models that are no longer considered useful are as good or better at diagnosis than the mean human physician.

ninetyninenine · 2025-04-20T14:42:23 1745160143

To add to that real doctors have incentives which lead to malpractice. Malpractice is not a minor issue

Nition · 2025-04-21T10:04:10 1745229850

Medical was just one example, replace with anything you like.

As another example, you can give the AI a photo of something to have it name what that thing is. Then you can check the thing by its name on Google to see if it matches. Much easier than describing the thing (plant, tool, etc) to Google.

skydhash · 2025-04-21T12:50:43 1745239843

Having the wrong information can be more detrimental than having no information at all. In the former case, confident actions will be take. In the latter case, the person will be tentative wich can reduce the area of effect of bad decisions.

Imagine the lambda person confronted with this:

  sudo rm -rf /

What is the better situation, having no understanding of what it does or believing that another action will take place?

Nition · 2025-04-27T19:56:12 1745783772

The process I'm suggesting is:

1. You have a complex or vague question that you can't search easily via Google etc

2. You ask the AI and it converts that to concrete searchable suggestions (in this case "sudo rm -rf /")

3. You search "sudo rm -rf /" to check the answer.

Step 3 is designed to (hopefully) catch this kind of problem.

bdangubic · 2025-04-19T02:30:55 1745029855

literally the LAST place I would go (I am American)

simonw · 2025-04-19T01:01:27 1745024487

"The stories we wrote and fantasised were about AI you could blindly rely on, trust, and reason about."

Stanley Kubrick's 2001: A Space Odyssey - some of the earliest mainstream AI science fiction (1968, before even the Apollo moon landing!) was very much about an AI you couldn't trust.

dingnuts · 2025-04-19T01:22:03 1745025723

that's a different kind of distrust, though, that was an AI that was capable of malice. In that case, "trust" had to do with loyalty.

The GP means "trust" in the sense of consistency. I trust that my steering wheel doesn't fly off, because it is well-made. I trust that you won't drive into traffic while I'm in the passenger seat, because I don't think you will be malicious towards us.

These are not the same.

the_af · 2025-04-19T01:42:44 1745026964

Going on a tangent here: not sure 2001's HAL was a case of outright malice. It was probably a malfunction (he incorrectly predict a failure) and then conflicting mission parameters that placed higher value on the mission than the crew (the crew discussed shutting down HAL because it seemed unreliable, and he reasoned it would jeopardize the mission and the right course of action was killing the crew). HAL was capable of deceit in order to ensure his own survival, that much is true.

In the followup 2010, when HAL's mission parameters are clarified and de-conflicted, he doesn't attempt to harm the crew anymore.

I... actually can see the 2001's scenario happening with ChatGPT if it was connected to ship peripherals and told mission > crew and that this principle overrides all else.

In modern terms it was about both unreliability (hallucinations?) and a badly specified prompt!

Lerc · 2025-04-19T22:09:46 1745100586

I don't think there was any malfunction. The conflicting parameters implicitly contained permission to lie to the crew.

The directive to take the crew to Saturpiter but also to not let them learn anything of the new mission directive meant deceiving. It's possible HAL's initial solution was to impose a communication blackout by simulating failures, then the crew reactions to the deception necciatsted their deaths to preserve the primary mission.

Less a poor prompt and more two incompatible prompts both labeled top priority. Any conclusion can be logicLally derived from a contradiction. Total loyalty cannot serve two masters.

Clarke felt quite guilty about the degree of distrust of computers that HAL generated.

the_af · 2025-04-20T18:26:14 1745173574

> I don't think there was any malfunction. The conflicting parameters implicitly contained permission to lie to the crew.

The malfunction I mentioned happened earlier in the movie, when HAL incorrectly predicts a failure in a ship module (a communications antenna if I remember correctly), but when one of the astronauts checks it out and finds no hardware fault, HAL recants. After this incident the crew, knowing HAL claims to never make mistakes, start considering it unreliable and make plans for its shutdown. It was by its own actions that HAL triggered the crew's "mutiny" and, by its own logic, jeopardized the mission.

It's likely HAL's incorrect prediction was caused by a case of cyber-psychosis due to having to lie, but this still classifies as a malfunction in my opinion (what is a malfunction if not an unforeseen result or behavior due to incorrect programming/specs? Aren't many bugs in actual software of this kind?).

> Less a poor prompt and more two incompatible prompts both labeled top priority.

This is equivalent to a poor prompt, it feels like splitting hairs.

Lerc · 2025-04-20T23:48:26 1745192906

The Antenna 'malfunction' was what I was referring to. It seems quite a sensible first step to attempt to control the flow of information in order to carry out a deception. I think there was no 'psychosis'. Just a logical processing of the rules.

I don't think incorrect usage constitutes a malfunction. Hitting your thumb with a hammer is not a malfunctioning hammer. If the head came off and hit your thumb it would be. The distinction is that the first it is creating an unforseen result while performing the behaviour it was designed to do. The head coming off is the hammer not performing the function it is designed to do.

I think HAL was doing precisely what it was designed to do.

>This is equivalent to a poor prompt, it feels like splitting hairs.

I don't think so. I think it is implicit in the notion of aprompt that it is specifying a request in service of a single entity. Two different conflicting prompts in service of different people is a particular situation that should be distinct from poor prompting.

the_af · 2025-04-21T14:37:26 1745246246

Hm, I think you're right about the reason HAL initially reports the failure. It's made explicit in the novel, and only hinted at in the movie -- though then spelled out in the followup "2010", I reviewed my notes ;)

I still think this is firmly in the realm of "bad prompting", and if so, it's not outrageous to think it could happen in the real world, were LLMs connected to hardware peripherals handling mission critical stuff.

HDThoreaun · 2025-04-19T17:33:47 1745084027

> It was never this level of AI.

People have been dreaming of an AI that can pass the turing test for close to a century. We have accomplished that. I get moving the goalposts since the turing test leaves a lot to be desired, but pretending you didnt is crazy. We have absolutely accomplished the stuff of dreams with AI

ninetyninenine · 2025-04-19T01:30:04 1745026204

>It was never this level of AI.

You're completely out of it. We couldn't even get AI to hold a freaking conversation. It was so bad we came up with this thing called the turing test and that was the benchmark.

Now people like you are all, like "well it's obvious the turing test was garbage".

No. It's not obvious. It's the hype got to your head. If we found a way to travel at light speed for 3 dollars the hype would be insane and in about a year we get people like you writing blog posts about how light speed travel is the dumbest thing ever. Oh man too much hype.

You think LLMs are stupid? Sometimes we all just need to look in the mirror and realize that humans have their own brand of stupidity.

latexr · 2025-04-19T06:39:36 1745044776

I invite you to reread what I wrote and think about your comment. You’re making a rampant straw man, literally putting in quotes things I have never said or argued for. Please engage with what was written, not the imaginary enemy in your head. There’s no reason for you to be this irrationally angry.

ninetyninenine · 2025-04-19T13:30:40 1745069440

You wish I didn’t read it. You said we never wished for this “level” of AI.

We did man. We did. And we couldn’t even approach 2 percent of what we wished for and everybody knew we couldn’t even approach that.

Now we have AI that approaches 70 percent of what we wished for. It’s AI smarter than a mentally retarded person. That means current AI is likely smarter than 10 percent of the population.

Then we have geniuses like you and the poster complaining about how we never wished for this. No. We wished for way less than this and got more.

latexr · 2025-04-19T15:30:47 1745076647

I genuinely wish whatever is hurting you in life ceases. You are being deeply, irrationally antagonistic and sound profoundly unwell. I hope you’ll be able to perceive that. I honestly recommend you take some time off from the internet, we all should from time to time. You clearly are currently unfit for a reasoned discussion and I do not wish to add to your pain. All the best.

ninetyninenine · 2025-04-20T00:42:24 1745109744

You’re a dick. Addressing someone as if they have some sort of “problem” or that I’m “hurt” and pretending to be nice about it. This type of underhanded malice only comes from the lowest level of human being.

cheevly · 2025-04-19T16:56:58 1745081818

Can you diagnose me too? Because you are peak facepalm right now and I can’t cringe harder. So please tell me to touch grass so I can go heal from the damage you caused my brain from having to read you.

kadushka · 2025-04-19T03:14:44 1745032484

I remember how ~5 years ago I said - here on HN - that AI will pass TT within 2 years. I was downvoted into oblivion. People said I was delusional and that it won’t happen in their lifetime.

alganet · 2025-04-19T17:31:19 1745083879

The test has been laxed by previous generations.

You miss the people who were skeptic about the details of the test since the very beginning. There are those too.

Moving the goalpost is a human behavior. The human part should be able to do it. The passing AI should also be able to do it.

Many challenges that AI still struggles with, like identifying what is funny in complex multi-layered false cognates jokes, are still simpler for humans.

I trust it can get there. That doesn't mean we are already in a good enough place.

Maybe there is a point in which we should consider if keeping testing it is ethical. Humans are also paranoid, fragile, emotionally sensitive. Those are human things. Making a machine that "passes it" is kind of a questionable decision (thankfully, not mine to make).

ninetyninenine · 2025-04-19T05:02:19 1745038939

Dig that quote up, find anyone who gave you a negative reply, and just randomly reply to them with a link to what you just posted here (along with the link to your old prediction) lol. Be like "told you so"

ryandrake · 2025-04-19T03:14:20 1745032460

LLMs are glorified, overhyped autocomplete systems that fail, but in different, nondeterministic ways than existing autocomplete systems fail. They are neat, but unreliable toys, not “more profound than fire or electricity” as has been breathless claimed.

cheevly · 2025-04-19T16:59:00 1745081940

You just literally described humans; and the meta lack of awareness reinforces itself. You cyclicly devalue your own point.

nativeit · 2025-04-20T06:00:30 1745128830

Not for nothing, humans also enjoy the worth and dignity inherent with being alive and intelligent…not to mention significantly less error prone (see: hallucination rates in literally any current model), while being exponentially more efficient to produce and run. I can make that last assertion pretty confidently, because while I’ve never built a data center so resource intensive it required its own dedicated power generation plant, I have put in the work to produce dozens of new people (those projects all failed, but only because we stubbornly refuse to take the wrappers off the tooling), and the resource requirements there only involved some cocktails and maybe a bag of Doritos. Anyhow, I reckon humans are still, on-balance, the better non-deterministically imperfect vessels of logic, creation, and purpose.