> Aside from the boilerplate, which it got mostly right as far as I can tell, th...

sigotirandolas · on Dec 4, 2022

I'd expect the `copy_to_user` call to be inside `dev_read`, so that the userspace program can read the result with a further `read()` call, instead of mutating the buffer it gave to `write()` being mutated (that would probably not even work unless you used `write()` directly in your code instead of e.g. `fwrite()`).

Also as you noted, the logic related to handling `len` vs `sizeof(int)` is... curious.

While I find some ChatGPT examples impressive, this one isn't very enlightening. The block device structure and the factorial itself are fine, but those are likely in the training set as there are various tutorials online. But the combination of the factorial function into the block device is pretty awful (though I could imagine a novice doing it like this).

Both the OP and this driver kind of fit into this pattern, it looks like someone asked a novice programmer to do something using X and Y, so he read various few sample code and tutorials on X and Y and tries to put something together as best as it could, but the "plumbling" between X and Y (the part that wasn't in the tutorials) doesn't really work.

wizeman · on Dec 5, 2022

> I'd expect the `copy_to_user` call to be inside `dev_read`, so that the userspace program can read the result with a further `read()` call, instead of the buffer it gave to `write()` being mutated (that would probably not even work unless you used `write()` directly in your code instead of e.g. `fwrite()`).

Yes, you're probably right. It's been many years since I've written Linux kernel code and also, I've never implemented such a driver, so I missed that.

I asked ChatGPT: what is wrong with this Linux kernel code (and fed it the dev_write() function), and this is what it answered (on the first try):

> There are a few issues with this Linux kernel code:

> 1. The function dev_write() is declared as returning ssize_t, but the return value at the end of the function is sizeof(int). ssize_t and size_t are different types, so this could cause a type mismatch.

This is an interesting observation that you and I both missed.

> 2. The factorial() function is not defined in this code, so it is unclear what it does or where it is declared.

This is because I didn't feed it the factorial function nor the rest of the code: it was on a different chat session. One could guess what it does based on the name, but to be fair, it's not clear why such a function would be on the Linux kernel unless I gave it the full context.

> 3. The copy_from_user() and copy_to_user() functions are used to copy data between user space and kernel space. In this code, buffer is a pointer to user space, but input and result are local variables in kernel space. This means that copy_from_user() will copy data from user space into kernel space, but then copy_to_user() will copy data back from kernel space into the same buffer in user space, overwriting the original data. This is likely not the intended behavior.

I think this is basically the bug that you pointed out. At this point, it would be clear to me that the result would have to go back in dev_read() instead.

> 4. The len parameter is not used in this function, so it is unclear why it is included as an argument.

lol. Actually, it is being used, incorrectly.. and it wouldn't be used if we did the correction that I mentioned. So yes, this is somewhat hilariously wrong.

And it's clear why it has to be included as an argument, but to be fair, ChatGPT has no context whatsoever except knowing it's Linux kernel code (because I used a different chat session), so it's conceivable that it couldn't figure out why the function prototype has to be that way.

> Overall, this code is not well-written and would benefit from further clarification and revision.

It was you who wrote it, dumb-ass! :)

(Interestingly, ChatGPT can be compared to someone with Alzheimer in the sense that it can't remember conversations that happened in different chat sessions).

wizzwizz4 · on Dec 5, 2022

> > 1. The function dev_write() is declared as returning ssize_t, but the return value at the end of the function is sizeof(int). ssize_t and size_t are different types, so this could cause a type mismatch.

> This is an interesting observation that you and I both missed.

Hah. Call me when you find an architecture where ints use over half the addressable memory.

wizeman · on Dec 5, 2022

> Hah. Call me when you find an architecture where ints use over half the addressable memory.

I mean, I get your point if it's a joke :) But I think the AI was just pointing out that you'd get a compiler warning because of the type mismatch in signedness (is this even a word?).

int_19h · on Dec 6, 2022

It's the small negative numbers that bite you there when they undergo unsigned overflow.

Arch-TK · on Dec 8, 2022

> Please do tell, how is it hilariously wrong?

It copies from user, and then copies to user, back into the const it got. I don't even know if this would compile but certainly nobody would expect the buffer they just wrote to a file to suddenly end up re-written with a "reply" when using a special file.

I found that logic very funny.

Edit: To elaborate, the actual code to do this properly would have to allocate some memory when the file was opened, this memory would hold the reply once a question was written. Then when the answer was read it would have to respond with that in there. Finally when the file is closed it would have to deallocate.

This code is quite far from accurate, the issue I have with the concept that even if this bot was 99% accurate, C is not a "try it and see" language. If you apply "try it and see" techniques to C you just get subtle undefined behaviour which you might not notice until someone uses it to remotely execute code on your machine. Really I am not so sure that even humans can be trusted to write C but at the very least I am well aware of all the instances where C is a minefield and can pay extra attention to those, if you simply look at the plethora of information on C out there to learn C and apply it to kernel development you won't automatically build these intuitions no matter how much code you read because an enormous amount of C out there is hideously wrong in many ways.

Final edit: Once you start looking at the details, it has managed to implement a bunch of buffer overflows to boot.