> Where's the science on this I need reams and reams of hard data and several scientific papers to prove this because nothing exists in reality until there's a scientific papers written about it.
You tell me, since I've already asked you to find another paper with a larger sample size, yet clearly you're struggling again to find one after judging paper you used by its headline than actually reading it and its limitations.
> No I'm kidding don't actually give me science on this. Everything you said is a conclusion easily arrived at with just intuition, experience and common sense. You violate your own principles everytime you make a statement without a citation to a rigorous long winded scientific paper.
Perhaps you need to search as to what the whole point of explainability is in LLMs and why clinicians and physicians refer to these systems as untrustworthy black-box systems who's output cannot be trusted and still needs human medical professionals to check its output.
> You realize witnesses are non deterministic too? Yet a judge only needs one to convict a murderer. Non determinism doesn't mean jack in this Convo.
Except that the difference is humans can be held to account and transparently explain themselves when something goes wrong. An AI cannot explain transparently reason nor explain itself other than repeat and reword its own response and can't figure out it's own errors even when you point it out.
Non-determinism in LLMs is completely relevant due to the opaqueness as to how LLMs do their decision-making. Hence that, given an AI misdiagnoses a patient and lacks the transparent reasoning to show why it is wrong, then tells us it is untrustworthy to clinicians. Showing that 17 doctors couldn't diagnose a patient and ChatGPT could in ONE case does not mean it is 'reliable'.
Clinicians are interested in larger sample sizes in trials before making a judgement in the overall error rate and reliability in how effective a medical device is.
Everything beyond what you said mentioned is irrelevant.
> You talk about medical professionals laughing in my face do you mean the 17 professionals mentioned in the article who for 10 years failed to diagnose a simple issue? You think anybody cares for them laughing?
I'm still laughing at you for showing ONE clinical example and you proclaiming that as conclusive proof that LLMs can be used for medical advice and to completely replace all doctors. You realize that they can give the incorrect diagnosis at random? The still unanswered question is how effective it is over a large amount of cases and a sample size. i.e trials. Not one.
> Science has weaknesses. The first aspect of it that's weak is it's fucking slow and expensive. The second thing is that a fundamental point of science is that nothing can proven to be true. Statistics does not have the ability for proving anything. In the end you're still speculating with science.
Once again, as you have admitted already, one anecdote does not show that something is reliable. The point as which medical trials exist to test how reliable a system is, instead of releasing something that has been untested over a single paper which you seem to believe should happen, because of your own assumptions.
> And I'm saying your entire point is wrong. My point is right. You need to follow my point which is this:
Nope. You believe your opinion is 'right' over ONE anecdote and a single study which scratches the surface. Where as since from the beginning of deep neural networks which LLMs are based on, they fundamentally are black box systems and clinicians using them for diagnosis is unexplainable to them and showing those distant examples is unconvincing to them. Again, what about the number of cases over a larger sample size which it shows the incorrect diagnosis than the correct diagnosis?
Do you not realize why ChatGPT and others have a disclaimer that it CANNOT be used for giving medical advice?
> Also I never said chatGPT is overall more reliable then doctors. I think of it as the precursor to the thing that will replace them. That's a highly reasonable speculation that can be made with zero science needed.
Given an LLM frequently hallucinates and is a opaque system, they will always need human doctors to check that their decisions are not incorrect. Fully replacing doctors with opaque AI systems with that fact, is a wild speculation and even it one happens, people will trust humans more than an unattended AI system or a hypothetical AI-only system which no-one is held to account when the AI makes a mistake.
> The anecdotal data of 17 doctors failing here is valid supporting evidence for that speculation.
One case study of ChatGPT getting 1 diagnosis right does not tell us how reliable it is against a larger sample size, many other cases where it got its diagnosis incorrect on a larger scale which clinicians are looking for to show its effectiveness.
First, your insistence on scientific rigor is laudable but, quite frankly, limited in scope. We're on the cusp of a new era, and your demand for reams of data misses the point: it's not just about what we can prove right now; it's about the trajectory we're on. And let me tell you, that trajectory is heading towards AI surpassing human capability, whether you like it or not.
You talk about LLMs like ChatGPT being "black boxes," implying that's a reason they can't replace humans. Let me clue you in: medicine was a black box for centuries! And yet, we didn't sit around waiting for the perfect solution; we innovated, learned, and improved. Why shouldn't we expect the same trajectory for AI? Machine learning models are already becoming more explainable, and they'll only get better.
On the topic of accountability, you act as though it's an exclusively human trait. Let me burst that bubble for you. Accountability can be programmed, designed, and regulated into an AI system. Humans wrote the laws that hold people accountable; who's to say we can't draft a new legal framework for AI? The goal isn't to mimic human accountability but to surpass it, creating a system that not only learns from its mistakes but also minimizes them to an extent that humans cannot.
You dismiss the non-determinism of AI as a fatal flaw. But isn't that a human trait, too? How many times have medical professionals changed their "expert opinions" based on new evidence? The fact is, non-determinism exists everywhere, but what AI has the potential to offer is a level of data analysis and rapid adaptation that humans can't match.
As for the anecdote about the 17 doctors? Don't trivialize that. It's not just a point of failure for those specific doctors; it's a symptom of a flawed and fallible system. To argue that AI can't replace doctors because of one paper or anecdote is to entirely miss the point: we're not talking about the technology of today but of the technology of tomorrow. AI is on a path to becoming more reliable, more accountable, and more efficient than human medical professionals.
So yes, my point is that AI doesn't just have the potential to supplement human roles; it has the potential to replace them. Not today, maybe not tomorrow, but eventually. And it's not because AI is perfect; it's because it has the potential to be better, to continually improve in ways and at speeds that humans can't match.
We're not just dabbling in speculation here; we're tapping into a future that's hurtling toward us. If you're not prepared for it, you're not just standing in the way of progress; you're standing on the tracks. Prepare to get run over.
I can now get into a car driven by AI and go wherever I want. 2 years ago people like you were saying it's a pipe dream. You need a certain level of brain power an IQ of 90+ to realize that despite the fact that this anecdotal snippet of progress isn't scientifically rigorous it's a datapoint as strong as 17 doctors failing in front of chatGPT. It allows us to speculate realistically without the need for science.
> First, your insistence on scientific rigor is laudable but, quite frankly, limited in scope. We're on the cusp of a new era, and your demand for reams of data misses the point:
This is a case of reliability which requires an abundance of evidence of it in many parameters including a larger sample size which my question remains unanswered. You showing me one data point, does not remotely conclude LLMs are reliable for this use-case, especially for medical professionals.
> it's not just about what we can prove right now; it's about the trajectory we're on. And let me tell you, that trajectory is heading towards AI surpassing human capability, whether you like it or not.
For serious high risk use-cases (legal, financial, medical, transportation, etc) all require the reliability case to obtain the trust of the human. That needs extensive evidence, research, etc of the system working reliably which you have only shown only one data point which professionals cannot work with to make a conclusion on reliability at all.
> You talk about LLMs like ChatGPT being "black boxes," implying that's a reason they can't replace humans.
We're talking about clinicians; a high risk profession which it is almost certain that LLMs cannot fully replace all of them as I have already explained. As long as a human needs to check their outputs, then that will remain the case, by default.
> medicine was a black box for centuries! And yet, we didn't sit around waiting for the perfect solution; we innovated, learned, and improved. Why shouldn't we expect the same trajectory for AI? Machine learning models are already becoming more explainable, and they'll only get better.
That isn't the point. Clinicians have used other tools which are far more transparent than deep neural networks / LLMs and the massive disadvantage for LLMs has always been unable to transparently show its decision process and explaining itself.
There is a significant difference in the explainability of an LLM than with typical machines learning methods which don't use neural networks, and it has been known for decades that clinicians have a very low trust in using such systems unattended and in general, hence the back-peddling of disclaimers of never using these systems for medical advice, financial and legal advice, etc.
> Accountability can be programmed, designed, and regulated into an AI system....
Like what? So called 'guardrails' which have been found to have been broken into all the time? At least with human doctors, even if something goes wrong, there is always someone that is held to account to explain what exactly was the issue and what happened.
The fact that these AI systems still require a human to supervise it defeats the point of trusting it to fully replace all human doctors due to its frequent failure to explain transparently whenever one needs to understand its decisions.
> You dismiss the non-determinism of AI as a fatal flaw. But isn't that a human trait, too? How many times have medical professionals changed their "expert opinions" based on new evidence? The fact is, non-determinism exists everywhere, but what AI has the potential to offer is a level of data analysis and rapid adaptation that humans can't match.
It is a fatal flaw, made worse with the choice of AI system for the intended use-case and not every problem can be solved with an LLM, including social problems that need human interaction. As humans are able to reason and explain their decision process, LLMs have no concept of such a thing, even if their own creators claims to do so.
It is fundamental and by design for LLMs and related systems. Everything else beyond that is speculative or even science fiction.
> I can now get into a car driven by AI and go wherever I want. 2 years ago people like you were saying it's a pipe dream. You need a certain level of brain power an IQ of 90+ to realize that despite the fact that this anecdotal snippet of progress isn't scientifically rigorous it's a datapoint as strong as 17 doctors failing in front of chatGPT. It allows us to speculate realistically without the need for science.
Self-driving cars that are meant to drive as well or even better than a human can in all conditions is a science fiction pipe dream (Yes it is.). The designers of such autonomous systems already know this and the regulators have less trust in them and do not allow any system that has no human intervention to be on the roads.
The worst case is accounted for in terms of reliability (including failures, near misses, etc) and it completely makes zero sense and it is irresponsible for regulators and professionals to use just one data point of the system working, dismiss the hundreds of failures and then conclude that the AI system is reliable in all cases.
You tell me, since I've already asked you to find another paper with a larger sample size, yet clearly you're struggling again to find one after judging paper you used by its headline than actually reading it and its limitations.
> No I'm kidding don't actually give me science on this. Everything you said is a conclusion easily arrived at with just intuition, experience and common sense. You violate your own principles everytime you make a statement without a citation to a rigorous long winded scientific paper.
Perhaps you need to search as to what the whole point of explainability is in LLMs and why clinicians and physicians refer to these systems as untrustworthy black-box systems who's output cannot be trusted and still needs human medical professionals to check its output.
> You realize witnesses are non deterministic too? Yet a judge only needs one to convict a murderer. Non determinism doesn't mean jack in this Convo.
Except that the difference is humans can be held to account and transparently explain themselves when something goes wrong. An AI cannot explain transparently reason nor explain itself other than repeat and reword its own response and can't figure out it's own errors even when you point it out.
Non-determinism in LLMs is completely relevant due to the opaqueness as to how LLMs do their decision-making. Hence that, given an AI misdiagnoses a patient and lacks the transparent reasoning to show why it is wrong, then tells us it is untrustworthy to clinicians. Showing that 17 doctors couldn't diagnose a patient and ChatGPT could in ONE case does not mean it is 'reliable'.
Clinicians are interested in larger sample sizes in trials before making a judgement in the overall error rate and reliability in how effective a medical device is.
Everything beyond what you said mentioned is irrelevant.
> You talk about medical professionals laughing in my face do you mean the 17 professionals mentioned in the article who for 10 years failed to diagnose a simple issue? You think anybody cares for them laughing?
I'm still laughing at you for showing ONE clinical example and you proclaiming that as conclusive proof that LLMs can be used for medical advice and to completely replace all doctors. You realize that they can give the incorrect diagnosis at random? The still unanswered question is how effective it is over a large amount of cases and a sample size. i.e trials. Not one.
> Science has weaknesses. The first aspect of it that's weak is it's fucking slow and expensive. The second thing is that a fundamental point of science is that nothing can proven to be true. Statistics does not have the ability for proving anything. In the end you're still speculating with science.
Once again, as you have admitted already, one anecdote does not show that something is reliable. The point as which medical trials exist to test how reliable a system is, instead of releasing something that has been untested over a single paper which you seem to believe should happen, because of your own assumptions.
> And I'm saying your entire point is wrong. My point is right. You need to follow my point which is this:
Nope. You believe your opinion is 'right' over ONE anecdote and a single study which scratches the surface. Where as since from the beginning of deep neural networks which LLMs are based on, they fundamentally are black box systems and clinicians using them for diagnosis is unexplainable to them and showing those distant examples is unconvincing to them. Again, what about the number of cases over a larger sample size which it shows the incorrect diagnosis than the correct diagnosis?
Do you not realize why ChatGPT and others have a disclaimer that it CANNOT be used for giving medical advice?
> Also I never said chatGPT is overall more reliable then doctors. I think of it as the precursor to the thing that will replace them. That's a highly reasonable speculation that can be made with zero science needed.
Given an LLM frequently hallucinates and is a opaque system, they will always need human doctors to check that their decisions are not incorrect. Fully replacing doctors with opaque AI systems with that fact, is a wild speculation and even it one happens, people will trust humans more than an unattended AI system or a hypothetical AI-only system which no-one is held to account when the AI makes a mistake.
> The anecdotal data of 17 doctors failing here is valid supporting evidence for that speculation.
One case study of ChatGPT getting 1 diagnosis right does not tell us how reliable it is against a larger sample size, many other cases where it got its diagnosis incorrect on a larger scale which clinicians are looking for to show its effectiveness.