People publish stuff on the Internet for various reasons. Sometimes they want to share information, but if the information is shared, they want to be *attributed* as the authors.
> If you didn't want others to read your information you shouldn't have published it on the internet. That's all they're doing at the end of the day, reading it. They're not publishing it as their own, they just used publicly available data to train a model.
There is some nuance here that you fail to notice or you pretend you don't see it :D I can't copy-paste a computer program and resell it without a license. I can't say "Oh I've just read the bits, learned from it and based on this knowledge I created my own computer program that looks exactly the same except the author name is different in the »About...« section" - clearly, some reason has to be used to differentiate reading-learning-creating from simply copying...
What if instead of copy-pasting the digital code, you print it onto a film, pass the light through the film onto ants, make the light kill the ants exposed, and the rest of the ants eventually go away, and now use the dead ants as another film to somehow convert that back to digital data. You can now argue you didn't copy, you taught ants, the ants learned, and they created a new program. But you will fool no one. AI models don't actually learn, they are a different way the data is stored. I think when court decides if a use is fair and transformative enough, it investigates how much effort was put into this transformation: there was a lot of effort put into creating the AI, but once it was created, the effort put into any single work is nearly null, just the electricity, bandwidth, storage.
> There is some nuance here that you fail to notice or you pretend you don't see it :D
I could say the same for you:
> I can't copy-paste a computer program and resell it without a license. I can't say "Oh I've just read the bits, learned from it and based on this knowledge I created my own computer program that looks exactly the same except the author name is different in the »About...« section"
Nobody uses LLMs to copy others' code. Nobody wants a carbon-copy of someone else's software, if that's what they wanted they would have used those people's software. I mean maybe someone does but that's not the point of LLMs and it's not why people use them.
I use LLMs to write code for me some times. I am quite sure that nobody in history has ever written that code. It's not copied from anyone, it's written specifically to solve the given task. I'm sure it's similar to a lot of code out there, I mean it's not often we write truly novel stuff. But there's nothing wrong with that. Most websites are pretty similar. Most apps are pretty similar. Developers all over the world write the same-ish code every day.
And if you don't want anyone to copy your precious code then don't publish it. That's the most ironic thing about all this - you put your code on the internet for everyone to see and then you make a big deal about the possibility of an LLM copying it as a response to a prompt?
Bro if I wanted your code I could go to your public github repo and actually copy it, I don't need an LLM to do that for me. Don't publish it if you're so worried about being copied.
> If you didn't want others to read your information you shouldn't have published it on the internet. That's all they're doing at the end of the day, reading it. They're not publishing it as their own, they just used publicly available data to train a model.
There is some nuance here that you fail to notice or you pretend you don't see it :D I can't copy-paste a computer program and resell it without a license. I can't say "Oh I've just read the bits, learned from it and based on this knowledge I created my own computer program that looks exactly the same except the author name is different in the »About...« section" - clearly, some reason has to be used to differentiate reading-learning-creating from simply copying...
What if instead of copy-pasting the digital code, you print it onto a film, pass the light through the film onto ants, make the light kill the ants exposed, and the rest of the ants eventually go away, and now use the dead ants as another film to somehow convert that back to digital data. You can now argue you didn't copy, you taught ants, the ants learned, and they created a new program. But you will fool no one. AI models don't actually learn, they are a different way the data is stored. I think when court decides if a use is fair and transformative enough, it investigates how much effort was put into this transformation: there was a lot of effort put into creating the AI, but once it was created, the effort put into any single work is nearly null, just the electricity, bandwidth, storage.