It's also not about how good ChatGPT is now. How good will ChatGPT be in 5 years...

ben_w · on Dec 5, 2022

It’s also impossible to imagine correctly. Back in 2009, I was completely convinced that we’d be able to go into a car dealership and buy a brand new vehicle which had no steering wheel because the self-driving AI was just that good, within 10 years. Seemed reasonable at the time on the basis of the DARPA Grand Challenge results, but even 13 years later, it didn’t happen.

FiberBundle · on Dec 5, 2022

I think this is the crucial point, just as in all AI applications the way it deals with corner cases will decide its impact on the job market. And corner cases are usually where AI has consistently been performing badly.

Just as a reminder, after we had spectacular results on ImageNet, highly respected AI researchers were predicting the end of the radiologist occupation. Turns out that even when some state-of-the-art CV classification algorithm is used on any kind of scan, you still need a radiologist to look at the image basically in the same way as before.

If you write large-scale applications with the help of a system like ChatGPT you will still need to create accurate test coverage and an understanding of the problem that is essentially equivalent to that of the people writing the code themselves. Whether all of this would in the end actually lead to large enough productivity increases depends on how error-prone the AI generated code will be and given that it takes a lot more time to dig into unfamiliar codebases than those you've written yourself, I think it's anything but obvious that this will have a huge overall impact on the industry. But obviously I might be biased here, since I have a stake in the game.

ben_w · on Dec 5, 2022

Mm. Perhaps I should use it to help me write more and better automated tests…

urthor · on Dec 5, 2022

Yes, exactly.

There's also mathematically excellent reasons why that happened.

Self-driving cars are an impossibly complex problem.

Statistics are statistics.

Predicting the minority class correctly 99% of the time isn't good enough for autonomous driving. A car has to break for little Suzie 100% of the time.

However, generating 1,000 lines of code for a CRUD app? That's 99% bug free?

That's a helluva lot better than I can do.

As with all things. The solution is watch what the domain experts do.

sarchertech · on Dec 5, 2022

The equivalent is closer to a CRUD app that serves 99% of requests correctly. Which is nowhere near good enough to use.

But even if we do go with 99% bug free for the sake of argument, the usefulness depends on the type of bug. How harmful is it? How easy is it to detect?

I had my wife (a physician) ask ChatGPT medical questions and it was almost always subtly but dangerously and confidently wrong. It looked fine to me but it took an expert to spot the flaws. And frequently it required specialist knowledge that a physician outside of my wife’s specialty wouldn’t even know to find the problems.

If you need a senior engineer to read and understand every line of code this thing spits out I don’t see it as providing more than advanced autocomplete in real world use (which to be fair could be quite helpful).

It frequently takes more time to read and really comprehend a junior engineers PR than it would have to just do it myself. The only reason I’m not is mentoring.

urthor · on Dec 7, 2022

While I am in total agreement with you for many domains.

...knowing my first employer out of college, they'd be VERY happy with 99%.

"The end user will test the software" is very much the adage of the age. Cut priced, low quality.

motoxpro · on Dec 5, 2022

Just because your prediction was wrong doesn’t mean that we aren’t leaps and bounds ahead of where we were. Seems like that is the crux of people’s argument. Because it’s not perfect yet it’s not impressive.

ben_w · on Dec 5, 2022

Hm, well that's not the impression I want to create. I certainly think any human intelligence task can be equaled by an AI at some point, I just feel uncertain about any specific timescale.

And GPT-3 et al has a lot of knowledge, even if it messes up certain expert level details. Rather than comparing against domain experts, my anchor point here is the sort of mistakes that novelists, script writers, and journalists make when writing about any given topic.