> You are dismissing the study not because you need 1000 samples. You are dismissing it because you're biased.
Biased to what? You gave ONE study with a very small sample size of 30, which is a far cry of the 1,000 sample size you suggested? It is weak evidence and can be dismissed.
That hardly means that the next study which another group attempts to reproduce is conclusive 'evidence' that it is more reliable than doctors.
> 17 doctors who couldn't diagnose a simple issue for 10 years is a demonstration of total unreliability.
So this ONE anecdote immediately means that ChatGPT is far more reliable than human doctors and can be used for medical advice? How many cases are there when ChatGPT gets its response incorrect out of 1,000 cases?
>Biased to what? You gave ONE study with a very small sample size of 30, which is a far cry of the 1,000 sample size you suggested?
Go read what I said again because you completely misinterpreted. I said a 1000 sample size isn't needed by a judge. A judge can convict a murderer off of 1 witness.
>That hardly means that the next study which another group attempts to reproduce is conclusive 'evidence' that it is more reliable than doctors.
This is what gets me. Do some people really go through reality only deeming things that have been under the lens of a scientific study to be real? Is the sky blue? Are you alive? Do you need a scientific study to prove it to you? Are you so enamored with science that the only way you can speculate or debate a topic is if the scientific method has been applied rigorously?
Bro, just use chatGPT a couple of times. You'll find it's reliability is roughly above 50% for queries and questions no other machine on the face of the earth can answer. It beats humans consistently at speed and in many instances where creativity is required.
>So this ONE anecdote immediately means that ChatGPT is far more reliable than human doctors and can be used for medical advice? How many cases are there when ChatGPT gets its response incorrect out of 1,000 cases?
No but a sample size of 30 and beating out 17 doctors spending a decade analyzing one child in mere seconds is reasonable evidence that chatGPT is the precursor to the thing that will eat the entire medical diagnosis industry alive.
Right now the machine is better only in certain instances. The delta from zero to now signifies another delta in the impending future.
> Go read what I said again because you completely misinterpreted. I said a 1000 sample size isn't needed by a judge. A judge can convict a murderer off of 1 witness.
That is not how this works.
A sample size of 1 or 30 in ONE study in the medical industry is hardly convincing to physicians and other medical professions to conclusively prove that an experiment works reliably for others.
Thus, you have been comparing apples to oranges all this time which research in the clinical practice does not apply to whatever you are trying to compare with in the legal practice.
You need multiple, peer-reviewed and reproducible studies which verifies the claims set in the research paper and a larger sample size is favoured over an extremely small one. Not just 'past cases' or 'case studies'. That only shows that someone has done it within those disclosed limitations and in this case, it does NOT conclusively answer the reliability question.
Henceforth, it scratches the surface and can be dismissed as insufficient evidence that does not prove reliability.
> This is what gets me. Do some people really go through reality only deeming things that have been under the lens of a scientific study to be real? Is the sky blue? Are you alive? Do you need a scientific study to prove it to you? Are you so enamored with science that the only way you can speculate or debate a topic is if the scientific method has been applied rigorously?
As I have already explained you have just confused yourself with a nonsensical comparison right from the start. Had you actually read the paper you pasted instead of the title, you would have realize the non-determinstic nature of LLMs would mean that it is a limitation that would require further studies and methodologies (including larger more sample sizes) conclusively prove that ChatGPT is far more reliable than human doctors - even enough to fully replace them.
So framing my point to believe as if I am attempting to prove a tautology isn't what this is. LLMs are non-deterministic black-box models that requires rigorous evaluations and experiments and suggesting that one surface-level study with a limited sample size as proof that LLMs like ChatGPT are more reliable than doctors is beyond ludicrous, especially using that shallow research paper as an example.
> No but a sample size of 30 and beating out 17 doctors spending a decade analyzing one child in mere seconds is reasonable evidence that chatGPT is the precursor to the thing that will eat the entire medical diagnosis industry alive.
Again. One example isn't sufficient evidence to result into jumping to such wild conclusions of 'eating the entire medical diagnosis industry alive'.
You have already admitted, it is not even a reliable or transparent enough to be used for medical advice especially with my question being unanswered. I'll be more specific: For every case it gives the correct output on a diagnosis, how many cases are there when ChatGPT gets its response incorrect out of 1,000 cases?
ONE case study which scratches the surface with a product that is an opaque non-deterministic tool, is something that physicians and clinicians are extremely skeptical on as fully replacement which not only it cannot fully replace them, it will always require a human doctor to check its diagnosis regardless, especially when something goes wrong.
Bro. You are not addressing the dichotomy. Why is it in some instances you need stats and huge sample sizes and in other instances you need only one witness to convict someone for murder?
All your doing is regurgitating the same bs everyone knows about the nature of science, statistics and sample sizes that are all blindingly obvious to anyone.
I am presenting to you a valid dilemma that your big brain is skipping over because of your blind loyalty to science. You do not need a scientific study to tell you the sky is blue. A court doesn't need a thousand witnesses to convict a murder. Have you ever wondered why?
Do you need me to give you 1000 IQ questions to verify that you are an intelligent being? Do you need to give your mother or father those same tests to verify their intelligence? You have common sense right? You can talk to your mom and basically bypass science to verify her intelligence right?
Why the hell all of a sudden do you need a rigorous scientific study to verify the intellectual abilities of chatGPT? Your so smart that you can know your mother is intelligent without assessing her with 5000 iq questions but for chatGPT you suddenly need the raw power of scientific rigor to come to any conclusion? You can't just talk to it yourself and make your own conclusion?
Bro. You're irrational. chatGPT beat 17 doctors in seconds and it doesn't even phase you. But your mom who has likely never taken an IQ test doesn't need one test to verify her intelligence.
Go above the level of scientific rigor. Einstein didn't need sample sizes and statistics to speculate about black holes and general relativity. The verification came later, but the math and nature of reality was formulated and judged correct through common sense I described above. This was way before anything Einstein proposed was verified by "stats".
Do you not have the ability to bypass statistical data and formulate conclusions without it? Looks like no.
> Bro. You are not addressing the dichotomy. Why is it in some instances you need stats and huge sample sizes and in other instances you need only one witness to convict someone for murder?
Again, you're continuing to compare apples and oranges from different professions and applying it here which it is nonsensical. The unpredictable behaviour of LLMs like ChatGPT tells us that not only it is non-deterministic but it cannot be trusted at all and in that context, it always requires humans to check it and for the case of how reliable it is it needs much more experiments and scientific methods by others to attest that.
This is exactly why medical professionals in this case will laugh at your question. For clinicians to use one study as the truth and basis with a low sample of 1 or 30 to show whether a medical device is reliable, especially an AI as a medical device is beyond ridiculous.
> All your doing is regurgitating the same bs everyone knows about the nature of science, statistics and sample sizes that are all blindingly obvious to anyone.
So why aren't you able to understand that then? Except that your mistake was to begin by comparing the research methods in the legal profession and the medical profession and to use that in this case as a flawed analogy to show that 'reliability' is the same in both of them. Which I'm afraid you only confused yourself.
> I am presenting to you a valid dilemma...
Which again is irrelevant and besides the point. Everything after that is on the point of what I've already said on transparent explainability, which it is known that LLMs and AI models such as ChatGPT cannot reason or explain their decisions transparently, and thus need examination and further reproducibility by others due to their their unpredictable behaviour.
Such a system used for medical advice is quite frankly unsatisfactory for physicians and other medical professionals. Just because it worked for someone does not mean it is reliable and works for everyone else. Hence me asking you for a significantly larger sample size and more clinical research papers with ChatGPT being used.
> Bro. You're irrational. chatGPT beat 17 doctors in seconds and it doesn't even phase you. But your mom who has likely never taken an IQ test doesn't need one test to verify her intelligence.
I'm not the one drawing wild conclusions of reliability over one study and then suggesting that we can replace all doctors with ChatGPT just because of one anecdote showing it was correct for one it is also correct for others, which given its unpredictability that in itself is beyond illogical. You clearly are doing just that.
As long as it is a black-box AI model, physicians and medical professionals will always scrutinise its unpredictable nature and reliability rather than you trusting whatever diagnosis it gives as the truth.
> Go above the level of scientific rigor. Einstein didn't need sample sizes and statistics to speculate about black holes and general relativity. The verification came later, but the math and nature of reality was formulated and judged correct through common sense I described above. This was way before anything Einstein proposed was verified by "stats".
What does Einstein speculating his equations have to do with showing how an AI system that is non-deterministic is also reliable? There are different methods in showing this reliability as I have explained already and you bringing that up is an irrelevant distraction.
> Do you not have the ability to bypass statistical data and formulate conclusions without it? Looks like no.
The entire point IS conclusively showing reliability which you using ONE paper with a low sample size you used for that as the basis of that claim is laughably insufficient for clinicians to draw an overall conclusion to show how ChatGPT is more reliable than doctors to the point where it is safe for medical advice or completely replacing doctors (which isn't going to happen anyway).
> Again, you're continuing to compare apples and oranges from different professions and applying it here which it is nonsensical. The unpredictable behaviour of LLMs like ChatGPT tells us that not only it is non-deterministic but it cannot be trusted at all and in that context, it always requires humans to check it and for the case of how reliable it is it needs much more experiments and scientific methods by others to attest that.
Where's the science on this I need reams and reams of hard data and several scientific papers to prove this because nothing exists in reality until there's a scientific papers written about it.
No I'm kidding don't actually give me science on this. Everything you said is a conclusion easily arrived at with just intuition, experience and common sense. You violate your own principles everytime you make a statement without a citation to a rigorous long winded scientific paper.
You realize witnesses are non deterministic too? Yet a judge only needs one to convict a murderer. Non determinism doesn't mean jack in this Convo.
You talk about medical professionals laughing in my face do you mean the 17 professionals mentioned in the article who for 10 years failed to diagnose a simple issue? You think anybody cares for them laughing?
>What does Einstein speculating his equations have to do with showing how an AI system that is non-deterministic is also reliable? There are different methods in showing this reliability as I have explained already and you bringing that up is an irrelevant distraction.
It's relevant. You're just excited thinking the conversation is going in some strange direction of ultimate statistical rigor as the only valid topic of conversation.
I bring up Einstein to show you we can talk about well believed and highly esteemed topics that have zero statistical verification and it is valid from the standpoint of scientists and "professionals".
I'm saying we don't need that level of rigor to talk about things that involve common sense.
Science has weaknesses. The first aspect of it that's weak is it's fucking slow and expensive. The second thing is that a fundamental point of science is that nothing can proven to be true. Statistics does not have the ability for proving anything. In the end you're still speculating with science.
>The entire point IS conclusively showing reliability which you using ONE paper with a low sample size you used for that as the basis of that claim is laughably insufficient for clinicians to draw an overall conclusion to show how ChatGPT is more reliable than doctors to the point where it is safe for medical advice or completely replacing doctors (which isn't going to happen anyway).
And I'm saying your entire point is wrong. My point is right. You need to follow my point which is this:
I can come to very real real conclusions about chatGPT and about LLMs without the need of resorting to science and statistical samples to verify statements in the same way you can make conclusions about your mom and her status as an intelligent being.
Also I never said chatGPT is overall more reliable then doctors. I think of it as the precursor to the thing that will replace them. That's a highly reasonable speculation that can be made with zero science needed.
The anecdotal data of 17 doctors failing here is valid supporting evidence for that speculation.
> Where's the science on this I need reams and reams of hard data and several scientific papers to prove this because nothing exists in reality until there's a scientific papers written about it.
You tell me, since I've already asked you to find another paper with a larger sample size, yet clearly you're struggling again to find one after judging paper you used by its headline than actually reading it and its limitations.
> No I'm kidding don't actually give me science on this. Everything you said is a conclusion easily arrived at with just intuition, experience and common sense. You violate your own principles everytime you make a statement without a citation to a rigorous long winded scientific paper.
Perhaps you need to search as to what the whole point of explainability is in LLMs and why clinicians and physicians refer to these systems as untrustworthy black-box systems who's output cannot be trusted and still needs human medical professionals to check its output.
> You realize witnesses are non deterministic too? Yet a judge only needs one to convict a murderer. Non determinism doesn't mean jack in this Convo.
Except that the difference is humans can be held to account and transparently explain themselves when something goes wrong. An AI cannot explain transparently reason nor explain itself other than repeat and reword its own response and can't figure out it's own errors even when you point it out.
Non-determinism in LLMs is completely relevant due to the opaqueness as to how LLMs do their decision-making. Hence that, given an AI misdiagnoses a patient and lacks the transparent reasoning to show why it is wrong, then tells us it is untrustworthy to clinicians. Showing that 17 doctors couldn't diagnose a patient and ChatGPT could in ONE case does not mean it is 'reliable'.
Clinicians are interested in larger sample sizes in trials before making a judgement in the overall error rate and reliability in how effective a medical device is.
Everything beyond what you said mentioned is irrelevant.
> You talk about medical professionals laughing in my face do you mean the 17 professionals mentioned in the article who for 10 years failed to diagnose a simple issue? You think anybody cares for them laughing?
I'm still laughing at you for showing ONE clinical example and you proclaiming that as conclusive proof that LLMs can be used for medical advice and to completely replace all doctors. You realize that they can give the incorrect diagnosis at random? The still unanswered question is how effective it is over a large amount of cases and a sample size. i.e trials. Not one.
> Science has weaknesses. The first aspect of it that's weak is it's fucking slow and expensive. The second thing is that a fundamental point of science is that nothing can proven to be true. Statistics does not have the ability for proving anything. In the end you're still speculating with science.
Once again, as you have admitted already, one anecdote does not show that something is reliable. The point as which medical trials exist to test how reliable a system is, instead of releasing something that has been untested over a single paper which you seem to believe should happen, because of your own assumptions.
> And I'm saying your entire point is wrong. My point is right. You need to follow my point which is this:
Nope. You believe your opinion is 'right' over ONE anecdote and a single study which scratches the surface. Where as since from the beginning of deep neural networks which LLMs are based on, they fundamentally are black box systems and clinicians using them for diagnosis is unexplainable to them and showing those distant examples is unconvincing to them. Again, what about the number of cases over a larger sample size which it shows the incorrect diagnosis than the correct diagnosis?
Do you not realize why ChatGPT and others have a disclaimer that it CANNOT be used for giving medical advice?
> Also I never said chatGPT is overall more reliable then doctors. I think of it as the precursor to the thing that will replace them. That's a highly reasonable speculation that can be made with zero science needed.
Given an LLM frequently hallucinates and is a opaque system, they will always need human doctors to check that their decisions are not incorrect. Fully replacing doctors with opaque AI systems with that fact, is a wild speculation and even it one happens, people will trust humans more than an unattended AI system or a hypothetical AI-only system which no-one is held to account when the AI makes a mistake.
> The anecdotal data of 17 doctors failing here is valid supporting evidence for that speculation.
One case study of ChatGPT getting 1 diagnosis right does not tell us how reliable it is against a larger sample size, many other cases where it got its diagnosis incorrect on a larger scale which clinicians are looking for to show its effectiveness.
First, your insistence on scientific rigor is laudable but, quite frankly, limited in scope. We're on the cusp of a new era, and your demand for reams of data misses the point: it's not just about what we can prove right now; it's about the trajectory we're on. And let me tell you, that trajectory is heading towards AI surpassing human capability, whether you like it or not.
You talk about LLMs like ChatGPT being "black boxes," implying that's a reason they can't replace humans. Let me clue you in: medicine was a black box for centuries! And yet, we didn't sit around waiting for the perfect solution; we innovated, learned, and improved. Why shouldn't we expect the same trajectory for AI? Machine learning models are already becoming more explainable, and they'll only get better.
On the topic of accountability, you act as though it's an exclusively human trait. Let me burst that bubble for you. Accountability can be programmed, designed, and regulated into an AI system. Humans wrote the laws that hold people accountable; who's to say we can't draft a new legal framework for AI? The goal isn't to mimic human accountability but to surpass it, creating a system that not only learns from its mistakes but also minimizes them to an extent that humans cannot.
You dismiss the non-determinism of AI as a fatal flaw. But isn't that a human trait, too? How many times have medical professionals changed their "expert opinions" based on new evidence? The fact is, non-determinism exists everywhere, but what AI has the potential to offer is a level of data analysis and rapid adaptation that humans can't match.
As for the anecdote about the 17 doctors? Don't trivialize that. It's not just a point of failure for those specific doctors; it's a symptom of a flawed and fallible system. To argue that AI can't replace doctors because of one paper or anecdote is to entirely miss the point: we're not talking about the technology of today but of the technology of tomorrow. AI is on a path to becoming more reliable, more accountable, and more efficient than human medical professionals.
So yes, my point is that AI doesn't just have the potential to supplement human roles; it has the potential to replace them. Not today, maybe not tomorrow, but eventually. And it's not because AI is perfect; it's because it has the potential to be better, to continually improve in ways and at speeds that humans can't match.
We're not just dabbling in speculation here; we're tapping into a future that's hurtling toward us. If you're not prepared for it, you're not just standing in the way of progress; you're standing on the tracks. Prepare to get run over.
I can now get into a car driven by AI and go wherever I want. 2 years ago people like you were saying it's a pipe dream. You need a certain level of brain power an IQ of 90+ to realize that despite the fact that this anecdotal snippet of progress isn't scientifically rigorous it's a datapoint as strong as 17 doctors failing in front of chatGPT. It allows us to speculate realistically without the need for science.
> First, your insistence on scientific rigor is laudable but, quite frankly, limited in scope. We're on the cusp of a new era, and your demand for reams of data misses the point:
This is a case of reliability which requires an abundance of evidence of it in many parameters including a larger sample size which my question remains unanswered. You showing me one data point, does not remotely conclude LLMs are reliable for this use-case, especially for medical professionals.
> it's not just about what we can prove right now; it's about the trajectory we're on. And let me tell you, that trajectory is heading towards AI surpassing human capability, whether you like it or not.
For serious high risk use-cases (legal, financial, medical, transportation, etc) all require the reliability case to obtain the trust of the human. That needs extensive evidence, research, etc of the system working reliably which you have only shown only one data point which professionals cannot work with to make a conclusion on reliability at all.
> You talk about LLMs like ChatGPT being "black boxes," implying that's a reason they can't replace humans.
We're talking about clinicians; a high risk profession which it is almost certain that LLMs cannot fully replace all of them as I have already explained. As long as a human needs to check their outputs, then that will remain the case, by default.
> medicine was a black box for centuries! And yet, we didn't sit around waiting for the perfect solution; we innovated, learned, and improved. Why shouldn't we expect the same trajectory for AI? Machine learning models are already becoming more explainable, and they'll only get better.
That isn't the point. Clinicians have used other tools which are far more transparent than deep neural networks / LLMs and the massive disadvantage for LLMs has always been unable to transparently show its decision process and explaining itself.
There is a significant difference in the explainability of an LLM than with typical machines learning methods which don't use neural networks, and it has been known for decades that clinicians have a very low trust in using such systems unattended and in general, hence the back-peddling of disclaimers of never using these systems for medical advice, financial and legal advice, etc.
> Accountability can be programmed, designed, and regulated into an AI system....
Like what? So called 'guardrails' which have been found to have been broken into all the time? At least with human doctors, even if something goes wrong, there is always someone that is held to account to explain what exactly was the issue and what happened.
The fact that these AI systems still require a human to supervise it defeats the point of trusting it to fully replace all human doctors due to its frequent failure to explain transparently whenever one needs to understand its decisions.
> You dismiss the non-determinism of AI as a fatal flaw. But isn't that a human trait, too? How many times have medical professionals changed their "expert opinions" based on new evidence? The fact is, non-determinism exists everywhere, but what AI has the potential to offer is a level of data analysis and rapid adaptation that humans can't match.
It is a fatal flaw, made worse with the choice of AI system for the intended use-case and not every problem can be solved with an LLM, including social problems that need human interaction. As humans are able to reason and explain their decision process, LLMs have no concept of such a thing, even if their own creators claims to do so.
It is fundamental and by design for LLMs and related systems. Everything else beyond that is speculative or even science fiction.
> I can now get into a car driven by AI and go wherever I want. 2 years ago people like you were saying it's a pipe dream. You need a certain level of brain power an IQ of 90+ to realize that despite the fact that this anecdotal snippet of progress isn't scientifically rigorous it's a datapoint as strong as 17 doctors failing in front of chatGPT. It allows us to speculate realistically without the need for science.
Self-driving cars that are meant to drive as well or even better than a human can in all conditions is a science fiction pipe dream (Yes it is.). The designers of such autonomous systems already know this and the regulators have less trust in them and do not allow any system that has no human intervention to be on the roads.
The worst case is accounted for in terms of reliability (including failures, near misses, etc) and it completely makes zero sense and it is irresponsible for regulators and professionals to use just one data point of the system working, dismiss the hundreds of failures and then conclude that the AI system is reliable in all cases.
> So this ONE anecdote immediately means that ChatGPT is far more reliable than human doctors and can be used for medical advice? How many cases are there when ChatGPT gets its response incorrect out of 1,000 cases?
You couldn't find a more clear case of straw man. No one said ChatGPT is more reliable then human doctors. No one said anything about incorrect responses.
The fact of the matter is that ChatGPT has diagnosed a disease which 17(!!!) doctors have missed. Even if the incorrect responses is 999/1000 cases it still worth it to include ChatGPT in this process since it's so cheap.
> You couldn't find a more clear case of straw man. No one said ChatGPT is more reliable then human doctors. No one said anything about incorrect responses.
Both the shallow research paper and that anecdote point is all focused on the basis of reliability. Henceforth, it doesn't discredit my point and question.
So how does one anecdotal point show that ChatGPT is more reliable than human doctors? If that is not the point to these celebrations, mentions, etc that both of you are doing then I can assume that it is far from the case then, unless you have a direct answer to that question?
> The fact of the matter is that ChatGPT has diagnosed a disease which 17(!!!) doctors have missed. Even if the incorrect responses is 999/1000 cases it still worth it to include ChatGPT in this process since it's so cheap.
So obviously as expected you still need human doctors regardless, as LLMs are opaque black-boxes which have a lack of transparent reasoning with unpredictable outputs. Thus how does that mean it can be used for medical advice or not?
Overall, this is a clear case of reliability to trust the output of an LLM and the entire point of the contrary is that for every 'correct' diagnosis made by the LLM, there are incorrect and vacuous responses which on top of its non-deterministic outputs comes with its lack of transparent explainability other than repeating on what it has been trained on to convince lesser expert users.
> So how does one anecdotal point show that ChatGPT is more reliable than human doctors? If that is not the point to these celebrations, mentions, etc that both of you are doing then I can assume that it is far from the case then, unless you have a direct answer to that question?
No one is saying chatGPT is more reliable then Doctors. Please keep this discussion grounded in reality.
> So how does one anecdotal point show that ChatGPT is more reliable than human doctors? If that is not the point to these celebrations, mentions, etc that both of you are doing then I can assume that it is far from the case then, unless you have a direct answer to that question?
It doesn't. It shows chatGPT beating 17 doctors in this case. That has NOTHING to do with reliability and that assumption is a big leap of logic you, and only you, are making.
> So obviously as expected you still need human doctors regardless, as LLMs are opaque black-boxes which have a lack of transparent reasoning with unpredictable outputs. Thus how does that mean it can be used for medical advice or not?
Yes, of course. That doesn't mean chatGPT couldn't be incorporated into doctors work flow and provide tangible value. No one is saying chatGPT should replace doctors.
Stop arguing against made up air castles. You are only fooling yourself.
Biased to what? You gave ONE study with a very small sample size of 30, which is a far cry of the 1,000 sample size you suggested? It is weak evidence and can be dismissed.
That hardly means that the next study which another group attempts to reproduce is conclusive 'evidence' that it is more reliable than doctors.
> 17 doctors who couldn't diagnose a simple issue for 10 years is a demonstration of total unreliability.
So this ONE anecdote immediately means that ChatGPT is far more reliable than human doctors and can be used for medical advice? How many cases are there when ChatGPT gets its response incorrect out of 1,000 cases?