> You do realize that the code you see regurgitated is most likely some permuted variant of a question/answer on Stack Overflow or a pull request on Github, right?
What do you think coders do?
Did they learn to code by themselves, without ever looking at any preexisting code, or what?
I've seen enough programmers who can't program, but most of them can at least be taught to program. (The look on their faces when they finally start to get it… It's one of the best things.) Most people working on the Linux kernel can actually program.
Most software engineering work is just plumbing existing libraries together, according to a specification given by a non-programmer. The hard part is translating the business requirements into something that a computer could understand; the exact syntax can be handled by a search engine, or a predictive text algorithm.
ChatGPT can't write a kernel device driver, and it can't act as a no-code tool for non-programmers. Those are the hard parts.
> “To tapping machine with hammer, 10s.; to knowing where to tap it, £10; total, £10. 10s.”
Now to be fair, the code is probably not totally correct, as probably there are parts still missing/wrong and there might even be compilation errors or other problems.
But here's the important part: you can tell which errors or problems you've observed and ChatGPT will fix these problems for you. Exactly like what a programmer does.
And sure, it cannot yet do this at scale, such as in implementing a huge kernel driver like a GPU driver.
But at this rate, give it a few years and an improved version might just be able to do anything a programmer does, perhaps even autonomously if we allow it to interact with a computer like a human does.
> Look at what I just did with ChatGPT in 30 seconds (and I did not cherry-pick, these were the first answers I got!):
Weird flex, as that code is like 90% boilerplate[1]. Everyone was freaking out about Copilot and no one seriously ended up using it because it just generates buggy (or copyrighted) code. It can't even handle writing unit tests with decent coverage (which is arguably the most repetitive/boring software engineering task).
> Weird flex, as that code is like 90% boilerplate[1].
Isn't 90% of code boilerplate anyway?
Also, didn't ChatGPT generate more than just the boilerplate?
Didn't it interpret what I wanted and generated the code for computing the factorial as well, as well as modifying the boilerplate (e.g. the kernel module name, printed messages, function names, the module description, ...) so that it matches what the kernel module is supposed to do? Which is exactly what a human would do?
Aren't you also missing the fact that I gave it a 2-sentence instruction and it "understood" exactly what to do, and then did it? Like a human programmer would do?
Which, in sum, is totally the opposite of what you were claiming?
> Everyone was freaking out about Copilot and no one seriously ended up using it because it just generates buggy (or copyrighted) code.
Don't most programmers also generate buggy code at first? Don't they iterate until the code works, like what ChatGPT does if you give it feedback about the bugs and problems you've encountered?
Also, Copilot and ChatGPT have different levels of capabilities, don't assume just because Copilot can't do something, that ChatGPT can't. ChatGPT is clearly a big step forward as you can clearly see from how everyone is freaking out about it.
Finally, don't assume that these models are never going to improve, ever again.
I mean, I’ve seen people claiming to use it and that it has significantly accelerating their work. On what are you basing the conclusion that it has no serious use?
and rest 10 lines will be written with Copilot with minimal assistance.
It's like having stupid but diligent assistant who's happy to copy&paste&adapt parts of code.
I can't claim that I often use fully generated Copilot functions. Sometimes I do, often with significant rework, but that's because, as I said, I'm very picky.
I paid GitHub $100 already and don't regret it.
Though I think that Copilot has plenty of features ahead.
For example finding obvious issues in the code would be very useful. Like typos.
Another issue with Copilot is that it only generates new code. Imagine that I need to edit 10 similar lines. I edit one line and I'd like Copilot to offer other edits.
Also UI is lacking. Like it generates 10 lines but I only like first line. Now I have to add 10 lines and delete 9.
> Please write me a small Linux kernel driver that calculates the factorial of a number when a user program writes an integer to /dev/factorial. The kernel driver outputs the answer to /dev/factorial as well.
> In computing, a device driver is a computer program that operates or controls a particular type of device that is attached to a computer or automaton.
I'm not disputing it can do that – plugging together well-known APIs and well-known programming problems. That's practically just style transfer, something we know these systems are fairly good at.
But given the spec for an unknown device – even quite a simple one – ChatGPT can't produce a device driver for it. How about this?
> An HP CalcPad 200 Calculator and Numeric Keypad behaves as a USB keyboard does. It has VID 0x040B and PID 0x2367. Please write me a small Linux kernel driver that allows me to use this keypad as a MouseKeys-style mouse device. If there's anything you don't understand, let me know.
I doubt any amount of prompt engineering would produce a satisfactory result – but if you did the hard part, and explained how it should do this? Well… maybe it'd be able to give a satisfactory output. But at that point, you're just programming in a high-level, hard-to-model language.
It's not a case of scale. Sure, a very large model might be able to do this, particular problem – but only because it'd have memorised code for a USB keyboard driver, and code for a MouseKeys implementation… and, heck, probably code for a MouseKeys kernel driver from somebody's hobby project.
GPT language models don't understand things: they're just very good at guessing. I've been an expert, and a schoolchild; I know how good you can get at guessing without any kind of understanding, and I know enough about what understanding feels like to know how it's different. (There is no metric you can't game by sufficiently-advanced guessing, but you'll never notice an original discovery even if you do accidentally stumble upon one.)
Aside from the boilerplate, which it got mostly right as far as I can tell, the actual logic is hilariously wrong. Moreover, Linux kernel development really isn't just writing stand-alone self contained chardev drivers which calculate n!. I would be more impressed if you used chat GPT to guide you through reverse engineering a piece of hardware and implementbing a driver for it.
> Aside from the boilerplate, which it got mostly right as far as I can tell, the actual logic is hilariously wrong.
Please do tell, how is it hilariously wrong?
It seems to have written a factorial function just like it should, it implemented the logic to read the integer from /dev/factorial when a user-space program writes to it, and then it writes the result back to /dev/factorial, and it also returns the number of bytes written correctly.
Which was the entire point of the exercise. Also note that ChatGPT itself said it was just a sample and it might be incomplete.
I noticed it has a bug, because it reads `len` bytes instead of `sizeof(int)` bytes, but a programmer could have made the same mistake.
I would also use a fixed-size unsigned integer rather than simply `int` (as it can invoke UB on overflow). You can ask ChatGPT "what is wrong with this code?" and it can spit out the same arguments I'm making. In fact, it detected an infinite-loop bug on piece of complex code which I had just written and indeed, it had an infinite-loop bug.
Perhaps some additional logic to handle reading multiple integers and writing multiple answers could be written, but that would be a further iteration of the code, not the initial one that I would write.
If that is hilariously wrong, then I would also be hilariously wrong. And I'm not just some random web developer, I actually wrote Linux kernel code professionally for years (although, that was a very long time ago).
So, maybe it got some details wrong, but I could conceivably also get those details wrong until I tried to compile/run the code and see what was wrong.
> I would be more impressed if you used chat GPT to guide you through reverse engineering a piece of hardware and implementbing a driver for it.
Yes, I would be more impressed with that as well. Perhaps someone will do that sometime. Even if not with ChatGPT, perhaps with a future version of it or a similar model.
I'd expect the `copy_to_user` call to be inside `dev_read`, so that the userspace program can read the result with a further `read()` call, instead of mutating the buffer it gave to `write()` being mutated (that would probably not even work unless you used `write()` directly in your code instead of e.g. `fwrite()`).
Also as you noted, the logic related to handling `len` vs `sizeof(int)` is... curious.
While I find some ChatGPT examples impressive, this one isn't very enlightening. The block device structure and the factorial itself are fine, but those are likely in the training set as there are various tutorials online. But the combination of the factorial function into the block device is pretty awful (though I could imagine a novice doing it like this).
Both the OP and this driver kind of fit into this pattern, it looks like someone asked a novice programmer to do something using X and Y, so he read various few sample code and tutorials on X and Y and tries to put something together as best as it could, but the "plumbling" between X and Y (the part that wasn't in the tutorials) doesn't really work.
> I'd expect the `copy_to_user` call to be inside `dev_read`, so that the userspace program can read the result with a further `read()` call, instead of the buffer it gave to `write()` being mutated (that would probably not even work unless you used `write()` directly in your code instead of e.g. `fwrite()`).
Yes, you're probably right. It's been many years since I've written Linux kernel code and also, I've never implemented such a driver, so I missed that.
I asked ChatGPT: what is wrong with this Linux kernel code (and fed it the dev_write() function), and this is what it answered (on the first try):
> There are a few issues with this Linux kernel code:
> 1. The function dev_write() is declared as returning ssize_t, but the return value at the end of the function is sizeof(int). ssize_t and size_t are different types, so this could cause a type mismatch.
This is an interesting observation that you and I both missed.
> 2. The factorial() function is not defined in this code, so it is unclear what it does or where it is declared.
This is because I didn't feed it the factorial function nor the rest of the code: it was on a different chat session. One could guess what it does based on the name, but to be fair, it's not clear why such a function would be on the Linux kernel unless I gave it the full context.
> 3. The copy_from_user() and copy_to_user() functions are used to copy data between user space and kernel space. In this code, buffer is a pointer to user space, but input and result are local variables in kernel space. This means that copy_from_user() will copy data from user space into kernel space, but then copy_to_user() will copy data back from kernel space into the same buffer in user space, overwriting the original data. This is likely not the intended behavior.
I think this is basically the bug that you pointed out. At this point, it would be clear to me that the result would have to go back in dev_read() instead.
> 4. The len parameter is not used in this function, so it is unclear why it is included as an argument.
lol. Actually, it is being used, incorrectly.. and it wouldn't be used if we did the correction that I mentioned. So yes, this is somewhat hilariously wrong.
And it's clear why it has to be included as an argument, but to be fair, ChatGPT has no context whatsoever except knowing it's Linux kernel code (because I used a different chat session), so it's conceivable that it couldn't figure out why the function prototype has to be that way.
> Overall, this code is not well-written and would benefit from further clarification and revision.
It was you who wrote it, dumb-ass! :)
(Interestingly, ChatGPT can be compared to someone with Alzheimer in the sense that it can't remember conversations that happened in different chat sessions).
> > 1. The function dev_write() is declared as returning ssize_t, but the return value at the end of the function is sizeof(int). ssize_t and size_t are different types, so this could cause a type mismatch.
> This is an interesting observation that you and I both missed.
Hah. Call me when you find an architecture where ints use over half the addressable memory.
> Hah. Call me when you find an architecture where ints use over half the addressable memory.
I mean, I get your point if it's a joke :) But I think the AI was just pointing out that you'd get a compiler warning because of the type mismatch in signedness (is this even a word?).
It copies from user, and then copies to user, back into the const it got. I don't even know if this would compile but certainly nobody would expect the buffer they just wrote to a file to suddenly end up re-written with a "reply" when using a special file.
I found that logic very funny.
Edit: To elaborate, the actual code to do this properly would have to allocate some memory when the file was opened, this memory would hold the reply once a question was written. Then when the answer was read it would have to respond with that in there. Finally when the file is closed it would have to deallocate.
This code is quite far from accurate, the issue I have with the concept that even if this bot was 99% accurate, C is not a "try it and see" language. If you apply "try it and see" techniques to C you just get subtle undefined behaviour which you might not notice until someone uses it to remotely execute code on your machine. Really I am not so sure that even humans can be trusted to write C but at the very least I am well aware of all the instances where C is a minefield and can pay extra attention to those, if you simply look at the plethora of information on C out there to learn C and apply it to kernel development you won't automatically build these intuitions no matter how much code you read because an enormous amount of C out there is hideously wrong in many ways.
Final edit: Once you start looking at the details, it has managed to implement a bunch of buffer overflows to boot.
> Fortunately, a human will know to fix that broken 4-space indentation and that brace placement before inclusion in the Linux kernel repository.
What's your point, that ChatGPT wouldn't know how to do that, especially if the kernel maintainers gave it such feedback?
I thought it was clear that it can in fact do that (sometimes by asking clarifying questions, like a human would).
I think some of the major things missing in ChatGPT is the ability to interact with a computer directly (including the compiler and checkpatch.pl, and using files for information storage instead of a limited N-token context), as well as interacting with humans by itself (e.g. via email).
And sure, it would still have very limited capabilities in many ways, don't get me wrong, as I don't think it could replace a programmer at this point.
> The hard part is translating the business requirements into something that a computer could understand
No, that's actually the easy part.
The hard part is to translate the requirements into something that a computer can understand and a human can also easily understand at the same time. Because otherwise, it's a one-off program that can't really be changed afterwards anymore.
And that is the real challenge that I'm curious about: how good will ChatGPT be in not only helping me with small, trivial, issues, but with the big ones.
"ChatGPT, here's this small 500k line repository. I need to change the business requirements and make it so that when a user fails to login 3 times in a row, a captcha is being shown before they can attempt again."
And if ChatGPT then gives me a 10k line diff that would be terrible. It should really minimize the diff but still keep it understandable. THAT is what I would love to see.
No need to feed it. I had it converting between the C=128's BASIC 7 and C=64's BASIC 2 without any additional context. Did a better job than 14 year old me had done back in the day, too.
What do you think coders do?
Did they learn to code by themselves, without ever looking at any preexisting code, or what?