My experience with GPT-4 has been really disappointing. It didn't feel like a step up from 3.5.
As an example, I've been trying to use it to learn Zig since the official docs are ... spartan. And I've said, "here's my code, here's the error, what's wrong with it?" and it will go completely off the rails suggesting fixes that don't do anything (or are themselves wrong).
In my case, understanding/fixing the code would have required GPT-4 to know the difference between allocating on the stack/heap and the lifetimes of pointers. It never even approached the right solution.
I haven't yet gotten it to help me in even a single instance. Every suggestion is wrong or won't compile, and it can't reason through the errors iteratively to find a fix. I'm sure this has to do with a small sample of Zig code in its training set, but I reckon an expert C coder could have spotted the bug instantly.
If you are using GPT-4 to try to deal with the fact that technical documentation on the public internet is sparse for your topic of interest, you are likely to be disappointed, since GPT-4’s training set likely has the same problem, so you are, in effect, hoping it will fill in gaps in missing data, prompting hallucinations.
It’ll be much better on subjects where there is too much information on the public internet for a person to efficiently manage and sift through.
I think you're right. My hope was that it could reason through the problem using knowledge from related sources like C and an understanding below the syntax of what was actually happening.
Depending on what you're doing, you might find few-shot techniques useful.
I used GPT 3.0 to maintain a code library in 4 languages, I'd write Dart (basically JS, so GPT knows it well), then give it a C++ equivalent of a function I had previously translated, and it could do any C++ from there.
1. GPT4 is learning from the same spartan docs as you, likely
2. GPT4's training data likely doesn't include significant Zig use, since large parts of its training data cut off a few years ago. I use Rust and it doesn't know about any recently added Rust features, either.
This has interesting implications because it means people will gravitate towards languages/frameworks/libraries that GPT knows well, which means even less training data will be generated for the new stuff. This is a form of value lock-in.
> This has interesting implications because it means people will gravitate towards languages/frameworks/libraries that GPT knows well, which means even less training data will be generated for the new stuff. This is a form of value lock-in.
That's the kind of problem that most people are just failing to see. The usage of this models might not in itself be problematic, but the changes that it bring are often unexpected and too deep for us to see clearly now. And yet, people are rushing towards them at full speed.
GPT-4 is just regurgitating what its "learned" from previously scraped content on the Internet. If somebody didn't answer it on StackOverflow before 2021, it doesn't know it. It can't reason able anything, it doesn't "understand" stacks or pointers.
That said its really good at regurgitating stuff from StackOverflow. But once you step beyond anything that someone has previously done and posted to the Internet, it quickly gets out of its depth.
It's a step up by an order of magnitude for certain things. Like chess. It is really good at chess actually. But not programming. Seems maybe marginally better on average. Worse in some ways.
One demo of gpt-4’s superiority over gpt-3 is to come up with a prompt that determines the language of some given text.
I couldn’t figure out a gpt-3 prompt that could handle “This text is written in French” correctly (it thinks it’s written in French), but with gpt-4 you can include in the prompt to disregard what the text says and focus on the words and grammar that it uses.
"In view of these overwhelming results measures must be taken to remove men from jobs where their predisposition to crime may have negative repercussions on society."
It is one of the best instruments for speculating on the extent of human idiocy.