> First, your insistence on scientific rigor is laudable but, quite frankly, lim...

> First, your insistence on scientific rigor is laudable but, quite frankly, limited in scope. We're on the cusp of a new era, and your demand for reams of data misses the point:

This is a case of reliability which requires an abundance of evidence of it in many parameters including a larger sample size which my question remains unanswered. You showing me one data point, does not remotely conclude LLMs are reliable for this use-case, especially for medical professionals.

> it's not just about what we can prove right now; it's about the trajectory we're on. And let me tell you, that trajectory is heading towards AI surpassing human capability, whether you like it or not.

For serious high risk use-cases (legal, financial, medical, transportation, etc) all require the reliability case to obtain the trust of the human. That needs extensive evidence, research, etc of the system working reliably which you have only shown only one data point which professionals cannot work with to make a conclusion on reliability at all.

> You talk about LLMs like ChatGPT being "black boxes," implying that's a reason they can't replace humans.

We're talking about clinicians; a high risk profession which it is almost certain that LLMs cannot fully replace all of them as I have already explained. As long as a human needs to check their outputs, then that will remain the case, by default.

> medicine was a black box for centuries! And yet, we didn't sit around waiting for the perfect solution; we innovated, learned, and improved. Why shouldn't we expect the same trajectory for AI? Machine learning models are already becoming more explainable, and they'll only get better.

That isn't the point. Clinicians have used other tools which are far more transparent than deep neural networks / LLMs and the massive disadvantage for LLMs has always been unable to transparently show its decision process and explaining itself.

There is a significant difference in the explainability of an LLM than with typical machines learning methods which don't use neural networks, and it has been known for decades that clinicians have a very low trust in using such systems unattended and in general, hence the back-peddling of disclaimers of never using these systems for medical advice, financial and legal advice, etc.

> Accountability can be programmed, designed, and regulated into an AI system....

Like what? So called 'guardrails' which have been found to have been broken into all the time? At least with human doctors, even if something goes wrong, there is always someone that is held to account to explain what exactly was the issue and what happened.

The fact that these AI systems still require a human to supervise it defeats the point of trusting it to fully replace all human doctors due to its frequent failure to explain transparently whenever one needs to understand its decisions.

> You dismiss the non-determinism of AI as a fatal flaw. But isn't that a human trait, too? How many times have medical professionals changed their "expert opinions" based on new evidence? The fact is, non-determinism exists everywhere, but what AI has the potential to offer is a level of data analysis and rapid adaptation that humans can't match.

It is a fatal flaw, made worse with the choice of AI system for the intended use-case and not every problem can be solved with an LLM, including social problems that need human interaction. As humans are able to reason and explain their decision process, LLMs have no concept of such a thing, even if their own creators claims to do so.

It is fundamental and by design for LLMs and related systems. Everything else beyond that is speculative or even science fiction.

> I can now get into a car driven by AI and go wherever I want. 2 years ago people like you were saying it's a pipe dream. You need a certain level of brain power an IQ of 90+ to realize that despite the fact that this anecdotal snippet of progress isn't scientifically rigorous it's a datapoint as strong as 17 doctors failing in front of chatGPT. It allows us to speculate realistically without the need for science.

Self-driving cars that are meant to drive as well or even better than a human can in all conditions is a science fiction pipe dream (Yes it is.). The designers of such autonomous systems already know this and the regulators have less trust in them and do not allow any system that has no human intervention to be on the roads.

The worst case is accounted for in terms of reliability (including failures, near misses, etc) and it completely makes zero sense and it is irresponsible for regulators and professionals to use just one data point of the system working, dismiss the hundreds of failures and then conclude that the AI system is reliable in all cases.