Hacker News new | past | comments | ask | show | jobs | submit login
A boy saw 17 doctors over 3 years for pain. ChatGPT found the right diagnosis (today.com)
109 points by birriel on Sept 12, 2023 | hide | past | favorite | 147 comments



Maybe I'm just getting old, but my reaction to this article had nothing to do with LLMs but was instead that this is an indictment of the state of medicine in America. At each diagnostic milestone it seemed obvious that the conclusion didn't adequately explain the totality of the boy's condition and the search needed to continue for the real culprit. Instead it seems like each one of these diagnosticians felt they'd found the "real problem" and pushed ahead with treatment as such.

Now I'm not saying that back in the day all doctors were Gregory House or holders of forgotten knowledge, but I have to think that the lower pressure to make medicine behave like a business and deeper ties one would form with their patients would've benefitted this family. For this to have gone on for three years without getting escalated to the attention of one of the real-life Dr. Houses out there is appalling. I feel like ChatGPT got the answer right because it allowed her to input all of the minor observations from the charts; perhaps had the doctors done likewise they'd have connected the dots as well.

When I was growing up and in my early 20s you went to the doctor and they got you talking about your life, teasing out the observations you had on your own life and addressed things from there. Then about 15 years ago everything changed and doctor visits became this twisted form of speed dating where you never see the same person twice and any question or answer that takes longer than ten seconds to express is taboo. I hate it, and I can't believe that we keep moving along like it's acceptable.


> Instead it seems like each one of these diagnosticians felt they'd found the "real problem" and pushed ahead with treatment as such.

More likely that they heard hoofbeats and were content to use the effectiveness of their treatment plan as another diagnostic signal.

> When I was growing up and in my early 20s you went to the doctor and they got you talking about your life, teasing out the observations you had on your own life and addressed things from there

The history and physical are a critical part of every exam. But much less so at an urgent care where they intentionally disavow long term health to focus explicitly on acute issues.


Actually how things usually worked out in House is likely realistic. I'm not talking about the simple cases, but main cases in the show. Which took multiple false starts and wrong treatments to get to final conclusion. In non-critical cases patient is either dead or it will take months or years.


Not just the US. Here in the UK, I recently went to a follow-up appointment regarding a non-healing ulcer and the NHS doctor took the dressing off and said "It is healed" despite the fact that:

1) the ulcer is deep enough that you can see the muscle below the skin fat layer (she said it no longer needed dressing!)

2) it has looked that way with nearly no change at all for nearly 3 months[1] (expected heal time is roughly 7 weeks)

3) there is a large area around the wound that is still red and inflamed (ulcers are not even regarded as healing until the inflammation is gone)

I've had to go for private care and I am currently scheduled in for minor surgery to see if a foreign object caused the ulcer and is still present, preventing it from healing. Probably no surprise to you, dear reader, that I am pursuing a case of medical negligence against the doctor.

[1] When I pointed this out to her, she said she "didn't have a crystal ball, so [she doesn't] know what will happen in the next week or month". Literally nonsense due to being a non-sequitur. I was informing about the past, not asking about the future.


Go back further when there were family doctors who knew generations of their patients' families and that's my memory as a kid. When I was in grade school, Doc Carpenter mentioned treating family members I'd never met because they were 3 generations older!

It was shocking to me that I heard about family mythos from my kindly old doctor. Luckily it was followed up with a lollipop to soothe me.


It stems from low supply of doctors. There aren't enough doctors because if there were more doctors they'd make less money.


More like hospitals refuse to self-fund residency slots. It's pretty bad when the number of doctors entering practice is fundamentally gated by residency positions funded by Congress.


https://blog.petrieflom.law.harvard.edu/2022/03/15/ama-scope...

The lack of residency positions is the result of lobbying by the AMA. What it the American medical association?

In there words:

"The American Medical Association, founded in 1847, represents more than 190 state societies and medical specialty associations, including internal medicine, family physicians, obstetricians and gynecologists, pediatric and emergency medicine. The AMA is the largest association of physicians—both MDs and DOs—and medical students in the U.S. Our mission is to “promote the art and science of medicine and the betterment of public health.”

It's a doctors old boys club and you can join if you're a doctor.

Here's more evidence:

https://med.fsu.edu/sites/default/files/news-publications/pr...

The low supply of doctors is not a free market phenomenon. It is the result of deliberate cartel policies of the AMA for self interest.

Keep in mind an over supply of doctors is generally a good thing for society. Not a good thing for doctors who want to be super rich.


I’m more inclined to go with the fact that medical school is extremely expensive and the process is long, laborious, and difficult but sure, ok.


Doesn't explain why medical costs and doctor salaries in the US are the highest in the entire freaking world and the quality of care is lowest among 1st world countries.

Don't go with your inclination. Go with evidence and logic.

I posted a reply to another person with this:

https://news.ycombinator.com/item?id=37476974

It's not well known but the root cause is cartel like policies of the AMA. Becoming a doctor (only in the US) is one of the most gate-kept professions in the world.


I’m not white-knighting for-profit healthcare in the US, but making unqualified conspiratorial comments naturally raises suspicion. You could have lead with this, especially if it’s not well known. I can admit I only had part of the picture; high cost and length of training is a factor, but I’ll buy that it’s a symptom of deliberate manipulation. Thanks for the links.


Generally very knowledgeable people know about this stuff. Give it 40-60 percent of the crowd on HN. I assumed that there was enough people who know about it to just say it outright.

Still not something I would call well-known though.

This country has plenty of stuff going on that falls into the "conspiracy category" but we now know is definitively true. Plenty. Everything Snowden revealed, the fact that the bush administration manufactured the evidence to push the entire country to go to war with Iraq. Plenty. I wouldn't turn my nose away when someone says something that seems "conspiratorial" given how much shit out there that has been verified definitively.

I get it though, stuff like area 51 captured UFOs will inevitably raise an eyebrow... it's hard to tell what's real and what's crackpot bs.


Not sure how increasing the supply of doctors would help in this case? If doctors begin to earn way less money than they do now then that would create an incentive to find ways to make money, whereby increasing the amount of deceptive practices of pushing certain medicines/medical procedures to prop up sales and not cure diseases.

I personally think money have no business (pun intended) in medicine. I would go as far as say that if a trillionaire parent have a sick child that have a disease that requires 1 trillion dollar to cure, the procedure should not go through and that is immoral. Which is to say that at a governmental/societal level, yes, money should absolutely be considered and allocations should be discussed. But when it comes to each individuals, money shouldn't be considered at all.


>Not sure how increasing the supply of doctors would help in this case? If doctors begin to earn way less money than they do now then that would create an incentive to find ways to make money, whereby increasing the amount of deceptive practices of pushing certain medicines/medical procedures to prop up sales and not cure diseases.

You don't increase it to the point of desperation where they resort to malpractice. You increase the supply until the cost and supply becomes inline with other 1st world countries where Doctors have reasonable salaries.

Additionally the low quality of care in the US actually comes from overwork. Doctors are inundated with patients and no amount of money can increase the productivity a single person. To do work effectively the industry actually needs more people.

The overabundance of patients is what's causing the apathy the GP is witnessing above. It's easy for a doctor to feel empathy and deliver quality care for a low number of patients. For 400 patients a day? They could give a shit. Pretty soon they become desensitized and the whole thing becomes a numbers game.


I don't necessarily understand your example, is it not more moral to let the child live than to have them die, regardless of the amount of money it takes, and is that not also making your point about not needing to consider money when trying to cure someone?


Unfortunately it’s the same in other healthcare systems - in public UK healthcare you have 10 mins appointments with doctors from history to diagnosis to treatment plan


Somehow we need to get to the point that patients expect doctors to use AI to assist their diagnosis.

I think at the moment using AI would lead to the perception that the doctor is incompetent.

But I would always rather a doctor who consulted AI and then overlaid their personal experience and expertise to rule in and out what the AI suggests.

The idea that any/every doctor knows everything, or even could know everything is so false and wrong and out of date.

There is probably a business opportunity in this somehow, for "AI first doctors" - their target customers being people who want the best human advice PLUS the best AI advice.


> Somehow we need to get to the point that patients expect doctors

Yes, and I think malpractice suits can work well here. If a doctor misdiagnoses a patient, leading to an injury, s/he needs to be able to document that an AI system was consulted and the AI-generated diagnosis was checked or considered. Maybe there's a good reason to discard the AI's suggestion, but you better at least have asked the AI, or you're going to slapped with a big malpractice penalty.


The irony is that the lawyers profiting are doing so because they have not been replaced by AI


Not that some aren't trying! https://www.nytimes.com/2023/06/08/nyregion/lawyer-chatgpt-s...

> In a cringe-inducing court hearing, a lawyer who relied on A.I. to craft a motion full of made-up case law said he “did not comprehend” that the chat bot could lead him astray.

> For nearly two hours Thursday, Mr. Schwartz was grilled by a judge in a hearing ordered after the disclosure that the lawyer had created a legal brief for a case in Federal District Court that was filled with fake judicial opinions and legal citations, all generated by ChatGPT. The judge, P. Kevin Castel, said he would now consider whether to impose sanctions on Mr. Schwartz and his partner, Peter LoDuca, whose name was on the brief.


There should, I would hope, be more to social considerations than litigation.


It's going to be interesting to see this play out with the AMA et al. The raging cynic in me thinks this is going to get very serious, very quickly. There are a lot of lifestyles at risk. On the other hand it will soon be possible to supply an endless parade of mothers offering politically unimpeachable testimony and demanding freedom to employ these tools.

I know who wins that, but there will be many pounds of flesh rent on the field in the meantime.


I’d be amazed if one or more of the large EMR vendors wasn’t already developing some sort of diagnostic co-pilot. They along with insurers have unique access to critical data for such applications and both have motivations… having worked for a major insurer, however, I don’t believe they have the technical competence or willingness to see longer-term value from true improvements in diagnostics vs. short-run thinking. (But I really hope I’m wrong there).


Yeah I feel the same. As soon as someone comes out with a medical AI service that gets traction, the AMA is going to go nuclear and get legislation to ban medical diagnosis AI.


The best they can do is similar to a surgeons general warning. I don't see an outright ban.


They will certainly attempt to employ IP barriers to the necessary data until they can figure out how to monetize and direct profits into the correct pockets. "Safety" will be the imperative on which they hang their cloaks, in addition to the conventional IP claims.


> at the moment using AI would lead to the perception that the doctor is incompetent

There are numerous examples, in popular culture, of geniuses working with AI assistants. This shouldn't be a difficult bridge to sell. Particularly if the patient can witness the whole conversation, possibly even feel involved in it.


Honestly, I think that's the opposite direction for human consideration.

I think patients would like doctors or nurses or nurse practitioners or whatever their insurance pays for . . . to pay attention to them and actually listen and care. Maybe AIs "do it better" in that case.

If computer systems care more about us than other human beings, the problem is not the computer systems.


> I think at the moment using AI would lead to the perception that the doctor is incompetent.

Then it's the patients that are stupid. i've seen my GP check some very specific vaccines / test procedures on the main gov body website and I was like "hey at least he's not making things up".

General medicine is not tailored for specific cases so AI is more than filling a need. It may be doing the job in the future.


The goal should not be to expect doctors to use AI to assist.

The goal is to get AI to be superior to doctors such that AI replaces doctors.

AI should augment your skill such that the doctor is no longer needed. The problem with the US medical system is that they hold your life hostage then rip you off. Make no mistake people attribute this to complexity and other bullshit but the money ends up in the pockets of administrators and doctors.

You want to fix the system? Attack the root. Replace doctors.


It’ll have to be better than today’s spicy autocomplete technology to do that, though. ChatGPT will never be in the driver’s seat; it can’t be trusted.


Yeah definitely. if anything chatGPT is the precursor to the thing that will eventually replace doctors.

But at the same time it's also the precursor to the thing that replaces programmers.


> at the same time it's also the precursor to the thing that replaces programmers.

It seems far more likely that the successors will create more programmers, not unlike the evolution of the elevator. Newer elevator designs didn't replace the elevator operator. They allowed anyone to become the elevator operator.


By programming I mean something separate from typing English and asking something to spit out code. If my manager asks me to code an app, is he programming or am I?

>Similar to the evolution of the elevator. Newer elevators didn't replace the elevator operator. They allowed anyone to become the elevator operator.

By "replace" I mean replace someones job. Similar to how the elevator made elevator operators unemployed.


> If my manager asks me to code an app, is he programming or am I?

In the right context, I would say that it possible that the boss is programming you and that you are, ultimately, both programmers. But even if we only want to think of programming in terms of computers, that is not a concern when it comes to a ChatGPT-like thing. It will almost certainly be of a computer.

> By "replace" I mean replace someones job. Similar to how the elevator made elevator operators unemployed.

The job is still there, but with an effectively infinite supply of workers having joined the market, that pushed the price for the work to zero. That left anyone wanting to be paid more out of the market.

In the same vein, if your employer found someone willing to do your work for a lower wage, fired you, and hired them instead, it seems fair to say that you were replaced, but was the profession replaced?

Anyway, fun analysis aside, understood.


  sed 's/doctor/programmer/g'


Mission accomplished, bit only if everyone can use it, and no one is in a position to gatekeep anyone else's access.

Which will never happen. Controlling access to power/information is too important.


Yeah that's the downside of course. I still think US doctors and the entire US medical system unlike other occupations, deserves to be replaced.

But the free market is the one that makes the rules here. If it can be replaced, it will, whether it deserves to be replaced is irrelevant.


Why would you expect "we have a better solution but we'll charge less for it" to be the outcome in a free market? That's just leaving money on the table.


Additionally keep in mind, this diagnosis was not made by a machine designed to be a "doctor".

The doctor part is a side effect. An emergent property. I can just imagine google running a powerful AI in the future and just using ads to fund the whole thing and the thing just completely demolishes the medical diagnosis industry just as an emergent side effect.


That's possible. Like they did to dedicated GPS/map hardware. Not sure Google's that bold any more, but somebody could.


You need a moat to keep that going for very long. Doctors have a regulated market in which to build moats, but as this is said to be a free market, what moat-suitable ground might there be?


Get yours certified by the AMA and have them make competitors illegal.


That would require a regulated market. The scenario was based on a hypothetical free market.


Because of the extreme increase in competition.


Obvious. Open Source, Piracy and copying. A phenomenon of the free market combined with products constructed out of pure information.


Not sure what's ChatGPT-specific here; patients discovering their own diseases has happened a lot since the internet has gone mainstream. Google has been the most prolific doctor of the 21st century. The only comparison given is

> But both seem to work better than the average symptom checker or Google as a diagnostic tool. “It’s a super high-powered medical search engine,” Beam says.

Which is, arguably, more of an indictment of how bad Google's search capabilities are in 2023 than anything else.


Bizarre take.

  Not sure what's ChatGPT-specific here
They literally used ChatGPT, specifically.

You may as well hear the story of William Kamkwamba building a wind turbine for his village and say "Not sure what's African-village specific here. People have been building wind turbines all over the world with various resources."


Assuming they meant “couldn’t you already do this with Google” I think the big difference is the context length. You can describe way more symptoms, and include a lot more data, than you ever could in a single search query. Even if Google worked properly, in its current form it’s just not capable of parsing pages of info and drawing conclusions beyond keywords.


A more apt comparison would be seeing one of the many "we rewrote our Python app in Go and improved performance by 100x" articles, and pointing out that most of the improvement comes from the rewrite and not the language.

Similarly, the real moral of the article is that doctors miss things and doing independent research, whether that's ChatGPT, Google, Bing, Yandex, or reading research papers yourself is sometimes necessary.


Interesting, the way I interpret what you're saying is "Obviously ChatGPT can make an accurate medical diagnosis" in the same way of "Obviously a Python app can be rewritten in Go"

Go has been around for quite a while now. ChatGPT relatively has not. If Go was still quite new and people were unsure of it, then your article headline actually does makes sense.

For people such as myself, it is not obvious that ChatGPT could do such things, hence the headline is useful context.


No that's absolutely not the same as python rewritten as Go. Not just that most ordinary people don't have access to arcane medical knowledge which are hidden behind paywalls or just not available to public, arriving at a diagnosis that 17 doctors missed before from just MRI notes is not a trivial thing that can be done by just "independent research". It's absolutely disingenuous to suggest otherwise.


It's not bizarre at all. Tons of people are in denial of the capabilities of chatgpt. The current strategy is to position it as nothing more than an advanced search engine.

Clearly chatGPT has flaws. Clearly it's more than a search engine. And clearly there's a large contingent of people who don't want to think it's anything more than a search engine. Probably was Bizarre when the hype first started but now these people in denial are a dime a dozen, as cliche as the people they want to bring down.


Google is like going to an individual doctor which this mother was doing already.

ChatGPT synthesized its result from multiple sources and communicated that - no source existed which understood the connections between various symptoms. This is exactly the thing that the mother was quoted at one point - every doctor was looking at things from their specialization and not holistically.


> ChatGPT synthesized its result from multiple sources and communicated that - no source existed which understood the connections between various symptoms.

Is this true? Or is the source that had all of the symptoms buried under WebMD/MayoClinic SEO spam?


Health info on google was captured by the likes of webmd and healthline 20 years ago and they've just been farming their seo position ever since, absolutely no innovation, just regurgitating medical literature 101.

But it really feels like Google's entire ecosystem is collapsing rapidly as seo bots cannibalize it. Ad-supported content is a local optima that LLMs will thankfully destroy.


> ... patients discovering their own diseases has happened a lot since the internet has gone mainstream...

With, of course, a corresponding increse in patients thinking they've discovered their own diseases.

For every "ChatGPT diagnosed my rare illness" story there's a "the Internet helped me find a bunch of people who are certain I've got Morgellons and chronic lyme".


Whatever happened to "expert systems" from the 80's, 90's? They were supposedly going to be able to solve this exact problem — medical diagnosis.

Perhaps this was the camp of AI that thought you could just build a decision tree (top down) ... that appears to have failed?

Instead the bottom-up (neural networked) AI seems to be winning out.

I wonder though if a ChatGPT would have been possible in 1985 with no rich internet to feed it data. I wonder if it would have been possible with even 2000 internet?


I’ve written several 1980s-style expert systems.

In practice they are basically simplified programming languages. If this, then that. The application of rules can lead to surprising emergent behavior. You might be able to find a pattern in it but you’re more likely to get wrong results and need to update the rule set.

There is also the problem of inputs. You end up needing to be an “expert” on the system to get good results.


The ones I've worked with weren't solely just if->then->else trees. There were also things like weightings, probabilities, risks, etc.


Yeah, I should have clarified. They can provide a way to define the “shape” of things and give answers or choose branches based on that.

That’s what I was thinking about when I mentioned emergent behavior.

The “fuzziness” of expert systems fascinates me. :)


Expert systems are not dead.

We use them at work for the diagnosis of mechanical parts rather than human bodies. We don't call them expert systems but they are exactly that. There is a database of potential failures, repairs and verifications, and it will suggest a troubleshooting procedure taking cost into account. It means that it may try unlikely causes first if they are easy to check. It also has simple learning capabilities.

The problem with expert systems is that it needs a curated and complete database. It is possible to do with mechanical parts as they are well specified and failure modes are well documented. The top-down approach comes naturally because that's how components in a mechanical system are designed. By comparison, human bodies are a complex mess.

As for ChatGPT, it is a pure product of technical advances. Neural networks are an old idea, they knew about the theory in the 60s, and we had implementations in the 80s. The problem is that with little data and limited computing power, they were little more than toys. They have been made practical in the last few years only because of the "rich internet" on one side and lots of powerful GPUs on the other. And the way I understand it, recent advances in deep learning technology is not really fundamental research, more like trying a bunch of stuff and see what sticks, again made possible by lots of data and computing power.


> Instead the bottom-up (neural networked) AI seems to be winning out.

What do you mean by ... winning out? It seems to have made one novel diagnosis, which is great for the individual in question. But how many times has it diagnosed something completely absurd? What's the ratio of signal to noise here?

If it's anything like trying to use it to solve difficult engineering problems it must have hallucinated turbocancer hundreds of times over.


>What do you mean by ... winning out?

Symbolic AI has been less performant than NN AI for decades now and by a large margin. Language, speech, images etc.

>It seems to have made one novel diagnosis

Ok so let's get this out the way. GPT-4 is good at diagnosis

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10425828/

>Six patients 65 years or older (2 women and 4 men) were included in the analysis. The accuracy of the primary diagnoses made by GPT-4, clinicians, and Isabel DDx Companion was 4 of 6 patients (66.7%), 2 of 6 patients (33.3%), and 0 patients, respectively. If including differential diagnoses, the accuracy was 5 of 6 (83.3%) for GPT-4, 3 of 6 (50.0%) for clinicians, and 2 of 6 (33.3%) for Isabel DDx Companion


When used in conjunction with an actual expert, ChatGPT’s statistical search can be very powerful. It’s essentially performing a statistical search over the space of all diseases it knows about and returning (at a first order approximation) the most reasonable diagnosis. What a doctor should do is to perform tests to rule the suggestions out.

The point isn’t to replace doctors but to empower them. They still need to come up with tests to rule out conditions.

We also have an algorithm for the tests: information gain. If you think of diagnosis like a game of Guess Who, we need to come up with tests which can narrow down the space of possible conditions. (It’s a bit more complicated of course since tests have a margin of error too.)


The human body is incredibly complex so most expert-derived models are less accurate than purely statistical methods. There are ways to marry the two, like Bayesian networks, but they take much longer to develop and aren't typically more accurate.

The thing is, a model doesn't have to be accurate to be useful. Expert-derived systems tend to operate closer to how doctors understand the problem, making them more interpretable.


I don't think anything close to the compute needed to train chatgpt or any of the other LLM's would be possible in 1985 or 2000


Decision tree is basically how doctors work, at least how they're trained to work. I think for a machine to be significantly better, for it to be impressive, it stands to reason it would work differently.


Not really. A programmatic decision tree will actually try out all the steps rather than glancing over the ones it thinks are not relevant. It also would not forget to test branches.

If all a doctor is doing is following a flow chart, you can be damn sure that a machine programmed with the same flow chart would do it more effectively.


GP mentioned flowcharts in the pedagogy of medicine which is an easily verifiable statement. Anyone willing to page through Harrison's Principles of Internal Medicine will be able to identify the decision tree figures. They are depicted on a pale green field.

Not the first in the tome, but pg 240[2] contains a great example of such a decision tree.

1. https://archive.org/details/harrisons-principles-of-internal...

2. https://archive.org/details/harrisons-principles-of-internal...


> If all a doctor is doing is following a flow chart

I don't think humans do that in the same way as code would. And why would they? That's exactly how they're more valuable!


If there was always (or, let's be honest, ever) time and resources to test every branch, then that would be a fair comparison.


That’s the point. Time and parallelization is cheap for machines.

I can see a short term future where “doctors” are replaced with non-doctor medical personal who gather data. Blood draws, temperature checks, collecting stool samples, photographing lesions, these are all easier with two hands. Analyzing the data will be done for the masses by machine.

And of course because of how the USA healthcare system works, it will somehow manage to be evens more expensive.


A robot doesn't have more resources, but it effectively has unlimited time to consider every branch.


The patient they're diagnosing, however, might not.


I wrote "consider" because that doesn't need to involve the patient.

But also the patient usually has plenty of time to answer additional questions. You can cover a ton of ground in less than an hour.


“I’ve considered all the options for three months. You have a rare cancer that will kill you in… uh oh.”

Patient doesn’t have to be sitting in the exam room to run out of time.


What I meant is that the computer can do three months of considering in less than five minutes real time. That's the benefit of tossing raw processing power at things.


It’s not fair to the doctor, but it’s better for the patient.


This is an ideal usecase for ChatGPT; synthesize a huge body of knowledge, and come up with some suggestions in an environment where a wrong answer doesn’t really hurt (since it’s either an answer she heard already, or a doctor shoots it down).

I think we can safely bet that doctors will be using these systems to generate suggestions (human in the loop) very soon, but making the diagnosis reliable enough that you’d take the human out of the loop could be extremely challenging.


> She scheduled an appointment with a new neurosurgeon and told her she suspected Alex had tethered cord syndrome. The doctor looked at his MRI images and knew exactly what was wrong with Alex

Doesn't this kind of summarize the article? She went to 17 doctors who did not know about the illness. ChatGPT pointed her to a competent individual who immediately diagnosed it. My understanding is that ChatGPT's diagnosis is wrong because tethered cord syndrome typically occurs after a surgery to correct a spinal bifida which is what the neurosurgeon diagnosed.


Without knowing the MRI report it’s hard to interpret this. If the radiologist had missed the spinal bifida then all the other doctors would rely on this report. Then it’s all a single point of failure: a radiologists report. May be other things, but I suspect the report is the root cause of the issues here.


> “I went line by line of everything that was in his (MRI notes) and plugged it into ChatGPT,” she says. “I put the note in there about ... how he wouldn’t sit crisscross applesauce. To me, that was a huge trigger (that) a structural thing could be wrong.”

>She eventually found tethered cord syndrome and joined a Facebook group for families of children with it. Their stories sounded like Alex's. She scheduled an appointment with a new neurosurgeon and told her she suspected Alex had tethered cord syndrome. The doctor looked at his MRI images and knew exactly what was wrong with Alex.

Notice how it gets very vague about the timing and specificity. It seems not that ChatGPT diagnosed tethered cord as tethered cord but that it happened to come up in one of the many chat sessions that led to the mother doing a deeper investigation of that.


A friend got the radiologist report for a musculoskeletal issue. Pasted into early GPT-4 and got a list of things to do, prognosis etc. I ran it by my dad (orthopaedic surgeon) and my friend asked his doc what to do.

Matched almost exactly. My dad said it was very impressive. Great software.


One comment in the story struck me -- doctors have become so specialized now that no single doctor is able to see the big picture anymore.


I've used ChatGPT for medical diagnoses support, and I thought it was fantastic.

I was most impressed that I could simply cut and paste lab results from a PDF, with its wonky formatting, and it still interpreted everything correctly.


LLM's are amazing. Truly amazing stuff. Let's not screw this up.


Articles breathlessly getting excited about anecdotal cases are a good way of doing just that.

(It's good this article is on HN, but I don't like this reporting)


I agree, it reads like the doctors didn't do their job properly.


>I agree, it reads like the doctors didn't do their job properly.

I'm sorry. But this is exactly what occured. Doctors failed to do their job. That mother payed money and wasted time and got nothing. That is unilateral and categorical failure of all Doctors involved.

Are they refunding her money for failing? Are they refunding her insurance company? No?

Then it's not just about doing their job properly. It's about theft. This is no different from a hospital charging 100k for 1 day of care.

Tell it like it is, doctors aren't doing their job properly.


That's what, I wrote(?)

Edit: no you don't know what I'm implying as I'm implying nothing here. Relax.


You wrote:

"I agree, it reads like the doctors didn't do their job properly."

Implying as if doctors DID do their job properly and that the article reads with a sensationalist style like they didn't.

I am saying you are completely wrong. The article is spot on on the failure of the doctors here.


It will be a shame if AI doesn’t become more mainstream in healthcare. AI and AR have the potential to disrupt the medical field.

I am sure we are not getting rid of actual doctors anytime soon, but having AI as a “co-pilot” would be immensely beneficial.


Says more about doctors really, even dealing with something as banal as back pain it took years and finally visiting a physical therapist to get the issue properly identified and resolved.


According to the article, it is difficult to diagnose babies/kids/toddlers because "[...] they can’t speak”.

The article also mentions that the kid didn't have the symptoms or physical manifestations of spina bifida, complicating the diagnosis.

This case reads as a genuinely difficult case to diagnose.


I wonder how https://www.drgupta.ai/ would have performed...


I wonder how many people used ChatGPT to find out their disease, got absurd results for their symptoms, and just shrugged it.


As much as I want to bash ChatGPT, this is actually a good use for it.

It’s been possible to google this stuff for a long time but you need to know how to use a search engine. And the results have slowly been getting worse over time.

ChatGPT adds another dimension to search. It’s not an oracle of truth but it can point you to things you would have thought of.



Does your dichotomy matter? When the system that curates the supposedly existing sources is impenetrable they're as good as fiction.

If it does matter then what might you propose? Hint: to be viable it must involve some professional class interest making bank.


I had a similar experience where ICU doctors and nurses missed the diagnosis, but I put the info into ChatGPT and it found the right diagnosis. I just wish I had tried it 12 hours earlier -- it would have saved a lot of suffering.



How about the other cases that ChatGPT misdiagnosed? We don't see them on media because, you know...


> How about the other cases that ChatGPT misdiagnosed?

She didn't consult ChatGPT and then start feeding her kid Tide pods. She took the diagnosis to a doctor. This is fine. This is productive. One could argue it wastes doctors' time, but in this case, it clearly didn't; our traditional random-search system did.


We don't see it because chatgpt is garbage in terms of reliability.

This article is on the media because Doctors claim they're good and effective. But when a unreliable product beats doctors it turns their claim on it's head.

This article isn't saying anything about chatGPT we don't already know. It's saying something damning about doctors.


According to this study, they are as reliable as doctors (for ED's context)[1]. Also, Google has developed Med-PaLM 2, which they also say matches human doctors. So yeah, they may be unreliable. But they are as unreliable as humans.

[1]https://www.sciencedirect.com/science/article/pii/S019606442...


> But they are as unreliable as humans.

Humans can be held to account by other humans if investigations are necessary to explain their own decisions transparently.

LLMs such as Med-PaLM, ChatGPT, and fundamentally all LLMs are non-deterministic and are unpredictable in whatever they give as their output, which is much more unreliable and cannot reason or explain themselves transparently why they decide to output their responses, but shuffle and reword their sentences randomly to make themselves appear coherent to the untrained eye, but when scrutinised by a professional, it is mostly garbage.

You would not seriously trust an AI to pilot a plane end to end without any human pilots on board or a FSD system to drive you from A to B whilst you sit at the back seat.


> You would not seriously trust... a FSD system to drive you from A to B whilst you sit at the back seat.

You might not, but I took a Cruise just the other day to a friend's house. Didn't even die or get into any accidents! (San Francisco)


> You would not seriously trust an AI to pilot a plane end to end without any human pilots on board or a FSD system to drive you from A to B whilst you sit at the back seat.

Well, people are doing exactly that with Waymo and Cruise, but with cars.


I wouldn't bet on that study. ChatGPT hallucinates. It can beat doctors but at the same time deliberately present wrong information.

Doctors at least make a best effort.


I don't care about best effort. I care about getting the correct diagnosis. GPT-4 can already offer insightful information, even if it's just by augmenting the doctors diagnosis efforts. Not using it is just dumb.


I hate doctors but at the same time you have to look at reality. Not every decision needs to be data driven, the qualitative aspects of hallucinations are very real and should not be ignored. I'm sure you know the hallucinations that pop out of chatGPT can get wild. Definitely use it, but do so with caution.


> Doctors at least make a best effort.

lol. Just skim the CFS thread for counterexamples


Eh best effort in terms of not killing you and harming you too much... that's what I meant.

For chatGPT hallucinations, anything goes, you know that machine has no boundaries so that has to be taken into account.


She went to 17 doctors that misdiagnosed the cause, so my question is: how about the doctors that can't find the right cause of the pain? You seem to be giving them a free pass


Exactly. We know that the AI bros won't tell us that, since it spoils their narrative.

For each guess in its diagnosis that is correct, there are around 1,000 incorrect suggestions that it makes, hence why it is untrustworthy in its diagnosis and for every answer it gives it lacks transparent reasoning, making it close to dangerous to rely on for medical advice.


Anybody have a good starting prompt to prime ChatGPT to answer medical questions or diagnoses?


And how much bad medical information has it given?


ChatGPT 4 won’t answer medical questions for me.


Ask it to do a role play as a doctor, or maybe just rephrase your question so it's less of a request for advice, but instead a discussion of scientific data/facts. It should be easy to get around thus guardrail.


This is amazing and we are only getting started.


This is so terrifying and true.


Let's see how the AI haters here are going to spin this one...


It is a worthwhile practice to open one’s mind to disagreeable views to find where their disagreement lies. If you were to spin this against ChatGPT, would you really have nothing to say?


Yes it's very important to have an open mind. I'm very curious how the AI haters will spin chatGPT diagnosing a kid while 17 doctors could not in anything but a win for AI.


I’m curious what you find so disagreeable. What arguments do you think are the strongest against AI in this context and why do you disagree?


One anecdote is not data. You need controlled trials that include both correct and incorrect diagnoses not just a headline


Wrong. One anecdote is one data point. If one person witnesses a murder that is a data point. The court doesn't just dismiss the data point because they need a sample size of 1000 witnesses for any statement to be meaningful. This article here says something meaningful about AI and it also says something damning about doctors.

Also there have been studies: https://www.sciencedirect.com/science/article/abs/pii/S01960...


Wrong again. This is a case of reliability and just because there is one anecdote of the 'correct' diagnosis, it hardly means it is reliable overall. In fact you wouldn't use these AI systems and trust their judgement at face value without the involvement of human doctors because by default, you don't trust their outputs. This is on top of the fact that these AI systems are unexplainable black boxes which cannot transparently reason and explain their own decisions.

It also appears that this 'research' study is quite frankly a surface level one with an extremely low sample size of 30 rather than the 1,000 sample size you suggested in your example.

So I can only dismiss that study since it is not sufficient enough to result in conclusive evidence of reliability.


You are dismissing the study not because you need 1000 samples. You are dismissing it because you're biased.

30 witnesses to a murder is highly reliable. Any judge or court who denies that is biased.

17 doctors who couldn't diagnose a simple issue for 10 years is a demonstration of total unreliability.


> You are dismissing the study not because you need 1000 samples. You are dismissing it because you're biased.

Biased to what? You gave ONE study with a very small sample size of 30, which is a far cry of the 1,000 sample size you suggested? It is weak evidence and can be dismissed.

That hardly means that the next study which another group attempts to reproduce is conclusive 'evidence' that it is more reliable than doctors.

> 17 doctors who couldn't diagnose a simple issue for 10 years is a demonstration of total unreliability.

So this ONE anecdote immediately means that ChatGPT is far more reliable than human doctors and can be used for medical advice? How many cases are there when ChatGPT gets its response incorrect out of 1,000 cases?


>Biased to what? You gave ONE study with a very small sample size of 30, which is a far cry of the 1,000 sample size you suggested?

Go read what I said again because you completely misinterpreted. I said a 1000 sample size isn't needed by a judge. A judge can convict a murderer off of 1 witness.

>That hardly means that the next study which another group attempts to reproduce is conclusive 'evidence' that it is more reliable than doctors.

This is what gets me. Do some people really go through reality only deeming things that have been under the lens of a scientific study to be real? Is the sky blue? Are you alive? Do you need a scientific study to prove it to you? Are you so enamored with science that the only way you can speculate or debate a topic is if the scientific method has been applied rigorously?

Bro, just use chatGPT a couple of times. You'll find it's reliability is roughly above 50% for queries and questions no other machine on the face of the earth can answer. It beats humans consistently at speed and in many instances where creativity is required.

>So this ONE anecdote immediately means that ChatGPT is far more reliable than human doctors and can be used for medical advice? How many cases are there when ChatGPT gets its response incorrect out of 1,000 cases?

No but a sample size of 30 and beating out 17 doctors spending a decade analyzing one child in mere seconds is reasonable evidence that chatGPT is the precursor to the thing that will eat the entire medical diagnosis industry alive.

Right now the machine is better only in certain instances. The delta from zero to now signifies another delta in the impending future.


> Go read what I said again because you completely misinterpreted. I said a 1000 sample size isn't needed by a judge. A judge can convict a murderer off of 1 witness.

That is not how this works.

A sample size of 1 or 30 in ONE study in the medical industry is hardly convincing to physicians and other medical professions to conclusively prove that an experiment works reliably for others.

Thus, you have been comparing apples to oranges all this time which research in the clinical practice does not apply to whatever you are trying to compare with in the legal practice.

You need multiple, peer-reviewed and reproducible studies which verifies the claims set in the research paper and a larger sample size is favoured over an extremely small one. Not just 'past cases' or 'case studies'. That only shows that someone has done it within those disclosed limitations and in this case, it does NOT conclusively answer the reliability question.

Henceforth, it scratches the surface and can be dismissed as insufficient evidence that does not prove reliability.

> This is what gets me. Do some people really go through reality only deeming things that have been under the lens of a scientific study to be real? Is the sky blue? Are you alive? Do you need a scientific study to prove it to you? Are you so enamored with science that the only way you can speculate or debate a topic is if the scientific method has been applied rigorously?

As I have already explained you have just confused yourself with a nonsensical comparison right from the start. Had you actually read the paper you pasted instead of the title, you would have realize the non-determinstic nature of LLMs would mean that it is a limitation that would require further studies and methodologies (including larger more sample sizes) conclusively prove that ChatGPT is far more reliable than human doctors - even enough to fully replace them.

So framing my point to believe as if I am attempting to prove a tautology isn't what this is. LLMs are non-deterministic black-box models that requires rigorous evaluations and experiments and suggesting that one surface-level study with a limited sample size as proof that LLMs like ChatGPT are more reliable than doctors is beyond ludicrous, especially using that shallow research paper as an example.

> No but a sample size of 30 and beating out 17 doctors spending a decade analyzing one child in mere seconds is reasonable evidence that chatGPT is the precursor to the thing that will eat the entire medical diagnosis industry alive.

Again. One example isn't sufficient evidence to result into jumping to such wild conclusions of 'eating the entire medical diagnosis industry alive'.

You have already admitted, it is not even a reliable or transparent enough to be used for medical advice especially with my question being unanswered. I'll be more specific: For every case it gives the correct output on a diagnosis, how many cases are there when ChatGPT gets its response incorrect out of 1,000 cases?

ONE case study which scratches the surface with a product that is an opaque non-deterministic tool, is something that physicians and clinicians are extremely skeptical on as fully replacement which not only it cannot fully replace them, it will always require a human doctor to check its diagnosis regardless, especially when something goes wrong.


Bro. You are not addressing the dichotomy. Why is it in some instances you need stats and huge sample sizes and in other instances you need only one witness to convict someone for murder?

All your doing is regurgitating the same bs everyone knows about the nature of science, statistics and sample sizes that are all blindingly obvious to anyone.

I am presenting to you a valid dilemma that your big brain is skipping over because of your blind loyalty to science. You do not need a scientific study to tell you the sky is blue. A court doesn't need a thousand witnesses to convict a murder. Have you ever wondered why?

Do you need me to give you 1000 IQ questions to verify that you are an intelligent being? Do you need to give your mother or father those same tests to verify their intelligence? You have common sense right? You can talk to your mom and basically bypass science to verify her intelligence right?

Why the hell all of a sudden do you need a rigorous scientific study to verify the intellectual abilities of chatGPT? Your so smart that you can know your mother is intelligent without assessing her with 5000 iq questions but for chatGPT you suddenly need the raw power of scientific rigor to come to any conclusion? You can't just talk to it yourself and make your own conclusion?

Bro. You're irrational. chatGPT beat 17 doctors in seconds and it doesn't even phase you. But your mom who has likely never taken an IQ test doesn't need one test to verify her intelligence.

Go above the level of scientific rigor. Einstein didn't need sample sizes and statistics to speculate about black holes and general relativity. The verification came later, but the math and nature of reality was formulated and judged correct through common sense I described above. This was way before anything Einstein proposed was verified by "stats".

Do you not have the ability to bypass statistical data and formulate conclusions without it? Looks like no.


> Bro. You are not addressing the dichotomy. Why is it in some instances you need stats and huge sample sizes and in other instances you need only one witness to convict someone for murder?

Again, you're continuing to compare apples and oranges from different professions and applying it here which it is nonsensical. The unpredictable behaviour of LLMs like ChatGPT tells us that not only it is non-deterministic but it cannot be trusted at all and in that context, it always requires humans to check it and for the case of how reliable it is it needs much more experiments and scientific methods by others to attest that.

This is exactly why medical professionals in this case will laugh at your question. For clinicians to use one study as the truth and basis with a low sample of 1 or 30 to show whether a medical device is reliable, especially an AI as a medical device is beyond ridiculous.

> All your doing is regurgitating the same bs everyone knows about the nature of science, statistics and sample sizes that are all blindingly obvious to anyone.

So why aren't you able to understand that then? Except that your mistake was to begin by comparing the research methods in the legal profession and the medical profession and to use that in this case as a flawed analogy to show that 'reliability' is the same in both of them. Which I'm afraid you only confused yourself.

> I am presenting to you a valid dilemma...

Which again is irrelevant and besides the point. Everything after that is on the point of what I've already said on transparent explainability, which it is known that LLMs and AI models such as ChatGPT cannot reason or explain their decisions transparently, and thus need examination and further reproducibility by others due to their their unpredictable behaviour.

Such a system used for medical advice is quite frankly unsatisfactory for physicians and other medical professionals. Just because it worked for someone does not mean it is reliable and works for everyone else. Hence me asking you for a significantly larger sample size and more clinical research papers with ChatGPT being used.

> Bro. You're irrational. chatGPT beat 17 doctors in seconds and it doesn't even phase you. But your mom who has likely never taken an IQ test doesn't need one test to verify her intelligence.

I'm not the one drawing wild conclusions of reliability over one study and then suggesting that we can replace all doctors with ChatGPT just because of one anecdote showing it was correct for one it is also correct for others, which given its unpredictability that in itself is beyond illogical. You clearly are doing just that.

As long as it is a black-box AI model, physicians and medical professionals will always scrutinise its unpredictable nature and reliability rather than you trusting whatever diagnosis it gives as the truth.

> Go above the level of scientific rigor. Einstein didn't need sample sizes and statistics to speculate about black holes and general relativity. The verification came later, but the math and nature of reality was formulated and judged correct through common sense I described above. This was way before anything Einstein proposed was verified by "stats".

What does Einstein speculating his equations have to do with showing how an AI system that is non-deterministic is also reliable? There are different methods in showing this reliability as I have explained already and you bringing that up is an irrelevant distraction.

> Do you not have the ability to bypass statistical data and formulate conclusions without it? Looks like no.

The entire point IS conclusively showing reliability which you using ONE paper with a low sample size you used for that as the basis of that claim is laughably insufficient for clinicians to draw an overall conclusion to show how ChatGPT is more reliable than doctors to the point where it is safe for medical advice or completely replacing doctors (which isn't going to happen anyway).


> Again, you're continuing to compare apples and oranges from different professions and applying it here which it is nonsensical. The unpredictable behaviour of LLMs like ChatGPT tells us that not only it is non-deterministic but it cannot be trusted at all and in that context, it always requires humans to check it and for the case of how reliable it is it needs much more experiments and scientific methods by others to attest that.

Where's the science on this I need reams and reams of hard data and several scientific papers to prove this because nothing exists in reality until there's a scientific papers written about it.

No I'm kidding don't actually give me science on this. Everything you said is a conclusion easily arrived at with just intuition, experience and common sense. You violate your own principles everytime you make a statement without a citation to a rigorous long winded scientific paper.

You realize witnesses are non deterministic too? Yet a judge only needs one to convict a murderer. Non determinism doesn't mean jack in this Convo.

You talk about medical professionals laughing in my face do you mean the 17 professionals mentioned in the article who for 10 years failed to diagnose a simple issue? You think anybody cares for them laughing?

>What does Einstein speculating his equations have to do with showing how an AI system that is non-deterministic is also reliable? There are different methods in showing this reliability as I have explained already and you bringing that up is an irrelevant distraction.

It's relevant. You're just excited thinking the conversation is going in some strange direction of ultimate statistical rigor as the only valid topic of conversation.

I bring up Einstein to show you we can talk about well believed and highly esteemed topics that have zero statistical verification and it is valid from the standpoint of scientists and "professionals".

I'm saying we don't need that level of rigor to talk about things that involve common sense.

Science has weaknesses. The first aspect of it that's weak is it's fucking slow and expensive. The second thing is that a fundamental point of science is that nothing can proven to be true. Statistics does not have the ability for proving anything. In the end you're still speculating with science.

>The entire point IS conclusively showing reliability which you using ONE paper with a low sample size you used for that as the basis of that claim is laughably insufficient for clinicians to draw an overall conclusion to show how ChatGPT is more reliable than doctors to the point where it is safe for medical advice or completely replacing doctors (which isn't going to happen anyway).

And I'm saying your entire point is wrong. My point is right. You need to follow my point which is this:

I can come to very real real conclusions about chatGPT and about LLMs without the need of resorting to science and statistical samples to verify statements in the same way you can make conclusions about your mom and her status as an intelligent being.

Also I never said chatGPT is overall more reliable then doctors. I think of it as the precursor to the thing that will replace them. That's a highly reasonable speculation that can be made with zero science needed.

The anecdotal data of 17 doctors failing here is valid supporting evidence for that speculation.


> Where's the science on this I need reams and reams of hard data and several scientific papers to prove this because nothing exists in reality until there's a scientific papers written about it.

You tell me, since I've already asked you to find another paper with a larger sample size, yet clearly you're struggling again to find one after judging paper you used by its headline than actually reading it and its limitations.

> No I'm kidding don't actually give me science on this. Everything you said is a conclusion easily arrived at with just intuition, experience and common sense. You violate your own principles everytime you make a statement without a citation to a rigorous long winded scientific paper.

Perhaps you need to search as to what the whole point of explainability is in LLMs and why clinicians and physicians refer to these systems as untrustworthy black-box systems who's output cannot be trusted and still needs human medical professionals to check its output.

> You realize witnesses are non deterministic too? Yet a judge only needs one to convict a murderer. Non determinism doesn't mean jack in this Convo.

Except that the difference is humans can be held to account and transparently explain themselves when something goes wrong. An AI cannot explain transparently reason nor explain itself other than repeat and reword its own response and can't figure out it's own errors even when you point it out.

Non-determinism in LLMs is completely relevant due to the opaqueness as to how LLMs do their decision-making. Hence that, given an AI misdiagnoses a patient and lacks the transparent reasoning to show why it is wrong, then tells us it is untrustworthy to clinicians. Showing that 17 doctors couldn't diagnose a patient and ChatGPT could in ONE case does not mean it is 'reliable'.

Clinicians are interested in larger sample sizes in trials before making a judgement in the overall error rate and reliability in how effective a medical device is.

Everything beyond what you said mentioned is irrelevant.

> You talk about medical professionals laughing in my face do you mean the 17 professionals mentioned in the article who for 10 years failed to diagnose a simple issue? You think anybody cares for them laughing?

I'm still laughing at you for showing ONE clinical example and you proclaiming that as conclusive proof that LLMs can be used for medical advice and to completely replace all doctors. You realize that they can give the incorrect diagnosis at random? The still unanswered question is how effective it is over a large amount of cases and a sample size. i.e trials. Not one.

> Science has weaknesses. The first aspect of it that's weak is it's fucking slow and expensive. The second thing is that a fundamental point of science is that nothing can proven to be true. Statistics does not have the ability for proving anything. In the end you're still speculating with science.

Once again, as you have admitted already, one anecdote does not show that something is reliable. The point as which medical trials exist to test how reliable a system is, instead of releasing something that has been untested over a single paper which you seem to believe should happen, because of your own assumptions.

> And I'm saying your entire point is wrong. My point is right. You need to follow my point which is this:

Nope. You believe your opinion is 'right' over ONE anecdote and a single study which scratches the surface. Where as since from the beginning of deep neural networks which LLMs are based on, they fundamentally are black box systems and clinicians using them for diagnosis is unexplainable to them and showing those distant examples is unconvincing to them. Again, what about the number of cases over a larger sample size which it shows the incorrect diagnosis than the correct diagnosis?

Do you not realize why ChatGPT and others have a disclaimer that it CANNOT be used for giving medical advice?

> Also I never said chatGPT is overall more reliable then doctors. I think of it as the precursor to the thing that will replace them. That's a highly reasonable speculation that can be made with zero science needed.

Given an LLM frequently hallucinates and is a opaque system, they will always need human doctors to check that their decisions are not incorrect. Fully replacing doctors with opaque AI systems with that fact, is a wild speculation and even it one happens, people will trust humans more than an unattended AI system or a hypothetical AI-only system which no-one is held to account when the AI makes a mistake.

> The anecdotal data of 17 doctors failing here is valid supporting evidence for that speculation.

One case study of ChatGPT getting 1 diagnosis right does not tell us how reliable it is against a larger sample size, many other cases where it got its diagnosis incorrect on a larger scale which clinicians are looking for to show its effectiveness.


First, your insistence on scientific rigor is laudable but, quite frankly, limited in scope. We're on the cusp of a new era, and your demand for reams of data misses the point: it's not just about what we can prove right now; it's about the trajectory we're on. And let me tell you, that trajectory is heading towards AI surpassing human capability, whether you like it or not.

You talk about LLMs like ChatGPT being "black boxes," implying that's a reason they can't replace humans. Let me clue you in: medicine was a black box for centuries! And yet, we didn't sit around waiting for the perfect solution; we innovated, learned, and improved. Why shouldn't we expect the same trajectory for AI? Machine learning models are already becoming more explainable, and they'll only get better.

On the topic of accountability, you act as though it's an exclusively human trait. Let me burst that bubble for you. Accountability can be programmed, designed, and regulated into an AI system. Humans wrote the laws that hold people accountable; who's to say we can't draft a new legal framework for AI? The goal isn't to mimic human accountability but to surpass it, creating a system that not only learns from its mistakes but also minimizes them to an extent that humans cannot.

You dismiss the non-determinism of AI as a fatal flaw. But isn't that a human trait, too? How many times have medical professionals changed their "expert opinions" based on new evidence? The fact is, non-determinism exists everywhere, but what AI has the potential to offer is a level of data analysis and rapid adaptation that humans can't match.

As for the anecdote about the 17 doctors? Don't trivialize that. It's not just a point of failure for those specific doctors; it's a symptom of a flawed and fallible system. To argue that AI can't replace doctors because of one paper or anecdote is to entirely miss the point: we're not talking about the technology of today but of the technology of tomorrow. AI is on a path to becoming more reliable, more accountable, and more efficient than human medical professionals.

So yes, my point is that AI doesn't just have the potential to supplement human roles; it has the potential to replace them. Not today, maybe not tomorrow, but eventually. And it's not because AI is perfect; it's because it has the potential to be better, to continually improve in ways and at speeds that humans can't match.

We're not just dabbling in speculation here; we're tapping into a future that's hurtling toward us. If you're not prepared for it, you're not just standing in the way of progress; you're standing on the tracks. Prepare to get run over.

I can now get into a car driven by AI and go wherever I want. 2 years ago people like you were saying it's a pipe dream. You need a certain level of brain power an IQ of 90+ to realize that despite the fact that this anecdotal snippet of progress isn't scientifically rigorous it's a datapoint as strong as 17 doctors failing in front of chatGPT. It allows us to speculate realistically without the need for science.


> First, your insistence on scientific rigor is laudable but, quite frankly, limited in scope. We're on the cusp of a new era, and your demand for reams of data misses the point:

This is a case of reliability which requires an abundance of evidence of it in many parameters including a larger sample size which my question remains unanswered. You showing me one data point, does not remotely conclude LLMs are reliable for this use-case, especially for medical professionals.

> it's not just about what we can prove right now; it's about the trajectory we're on. And let me tell you, that trajectory is heading towards AI surpassing human capability, whether you like it or not.

For serious high risk use-cases (legal, financial, medical, transportation, etc) all require the reliability case to obtain the trust of the human. That needs extensive evidence, research, etc of the system working reliably which you have only shown only one data point which professionals cannot work with to make a conclusion on reliability at all.

> You talk about LLMs like ChatGPT being "black boxes," implying that's a reason they can't replace humans.

We're talking about clinicians; a high risk profession which it is almost certain that LLMs cannot fully replace all of them as I have already explained. As long as a human needs to check their outputs, then that will remain the case, by default.

> medicine was a black box for centuries! And yet, we didn't sit around waiting for the perfect solution; we innovated, learned, and improved. Why shouldn't we expect the same trajectory for AI? Machine learning models are already becoming more explainable, and they'll only get better.

That isn't the point. Clinicians have used other tools which are far more transparent than deep neural networks / LLMs and the massive disadvantage for LLMs has always been unable to transparently show its decision process and explaining itself.

There is a significant difference in the explainability of an LLM than with typical machines learning methods which don't use neural networks, and it has been known for decades that clinicians have a very low trust in using such systems unattended and in general, hence the back-peddling of disclaimers of never using these systems for medical advice, financial and legal advice, etc.

> Accountability can be programmed, designed, and regulated into an AI system....

Like what? So called 'guardrails' which have been found to have been broken into all the time? At least with human doctors, even if something goes wrong, there is always someone that is held to account to explain what exactly was the issue and what happened.

The fact that these AI systems still require a human to supervise it defeats the point of trusting it to fully replace all human doctors due to its frequent failure to explain transparently whenever one needs to understand its decisions.

> You dismiss the non-determinism of AI as a fatal flaw. But isn't that a human trait, too? How many times have medical professionals changed their "expert opinions" based on new evidence? The fact is, non-determinism exists everywhere, but what AI has the potential to offer is a level of data analysis and rapid adaptation that humans can't match.

It is a fatal flaw, made worse with the choice of AI system for the intended use-case and not every problem can be solved with an LLM, including social problems that need human interaction. As humans are able to reason and explain their decision process, LLMs have no concept of such a thing, even if their own creators claims to do so.

It is fundamental and by design for LLMs and related systems. Everything else beyond that is speculative or even science fiction.

> I can now get into a car driven by AI and go wherever I want. 2 years ago people like you were saying it's a pipe dream. You need a certain level of brain power an IQ of 90+ to realize that despite the fact that this anecdotal snippet of progress isn't scientifically rigorous it's a datapoint as strong as 17 doctors failing in front of chatGPT. It allows us to speculate realistically without the need for science.

Self-driving cars that are meant to drive as well or even better than a human can in all conditions is a science fiction pipe dream (Yes it is.). The designers of such autonomous systems already know this and the regulators have less trust in them and do not allow any system that has no human intervention to be on the roads.

The worst case is accounted for in terms of reliability (including failures, near misses, etc) and it completely makes zero sense and it is irresponsible for regulators and professionals to use just one data point of the system working, dismiss the hundreds of failures and then conclude that the AI system is reliable in all cases.


> So this ONE anecdote immediately means that ChatGPT is far more reliable than human doctors and can be used for medical advice? How many cases are there when ChatGPT gets its response incorrect out of 1,000 cases?

You couldn't find a more clear case of straw man. No one said ChatGPT is more reliable then human doctors. No one said anything about incorrect responses.

The fact of the matter is that ChatGPT has diagnosed a disease which 17(!!!) doctors have missed. Even if the incorrect responses is 999/1000 cases it still worth it to include ChatGPT in this process since it's so cheap.


> You couldn't find a more clear case of straw man. No one said ChatGPT is more reliable then human doctors. No one said anything about incorrect responses.

Both the shallow research paper and that anecdote point is all focused on the basis of reliability. Henceforth, it doesn't discredit my point and question.

So how does one anecdotal point show that ChatGPT is more reliable than human doctors? If that is not the point to these celebrations, mentions, etc that both of you are doing then I can assume that it is far from the case then, unless you have a direct answer to that question?

> The fact of the matter is that ChatGPT has diagnosed a disease which 17(!!!) doctors have missed. Even if the incorrect responses is 999/1000 cases it still worth it to include ChatGPT in this process since it's so cheap.

So obviously as expected you still need human doctors regardless, as LLMs are opaque black-boxes which have a lack of transparent reasoning with unpredictable outputs. Thus how does that mean it can be used for medical advice or not?

Overall, this is a clear case of reliability to trust the output of an LLM and the entire point of the contrary is that for every 'correct' diagnosis made by the LLM, there are incorrect and vacuous responses which on top of its non-deterministic outputs comes with its lack of transparent explainability other than repeating on what it has been trained on to convince lesser expert users.


> So how does one anecdotal point show that ChatGPT is more reliable than human doctors? If that is not the point to these celebrations, mentions, etc that both of you are doing then I can assume that it is far from the case then, unless you have a direct answer to that question?

No one is saying chatGPT is more reliable then Doctors. Please keep this discussion grounded in reality.

> So how does one anecdotal point show that ChatGPT is more reliable than human doctors? If that is not the point to these celebrations, mentions, etc that both of you are doing then I can assume that it is far from the case then, unless you have a direct answer to that question?

It doesn't. It shows chatGPT beating 17 doctors in this case. That has NOTHING to do with reliability and that assumption is a big leap of logic you, and only you, are making.

> So obviously as expected you still need human doctors regardless, as LLMs are opaque black-boxes which have a lack of transparent reasoning with unpredictable outputs. Thus how does that mean it can be used for medical advice or not?

Yes, of course. That doesn't mean chatGPT couldn't be incorporated into doctors work flow and provide tangible value. No one is saying chatGPT should replace doctors.

Stop arguing against made up air castles. You are only fooling yourself.


What explains people's knee jerk reaction to something as awesome as this? It's weird reading comments here.


Because everytime a positive AI article is posted people come out of the woodworks trying to paint the current state of AI in the most negative light possible.


It's early, but I don't see any comments like that so far.


I feel bad for them.


What an irresponsible article.

> “It’s a super high-powered medical search engine,” Beam says.

NO IT IS NOT. THAT IS NOT WHAT IT IS AT ALL. How dare they publish this for a general audience.

I'm sure Today will also publish all the stories of mothers who kill their kids because they use ChatGPT like those infamous lawyers did a few months ago.

Shameful.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: