I don't understand this outrage. People put things on the internet for all to se...

entropi · 2025-11-06T16:43:12 1762447392

And I feel like the difference between, say;

- Paying for a ticket/dvd/stream to see a Ghibli movie

- Training a model on their work without compensating them, then enabling everyone to copy their art style with ~zero effort, flooding the market and diluting the value of their work. And making money in the process.

should be rather obvious. My only hypothesis so far is that a lot of the people in here have a vested interest in not understanding the outrage, so they don't.

sfn42 · 2025-11-06T18:32:49 1762453969

I don't have any vested interest. I'm just a guy. Don't work on AI, don't use AI much in my day job.

I just legitimately think the outrage is unreasonable. It is completely infeasible for AI companies to provide any meaningful amount of compensation to all the data sources they use.

Alternatively they could just not use any of the data, in which case we wouldn't have as good LLMs and other than that the world would be exactly the same. These data owners don't notice any difference. Using their data doesn't harm them in any way.

You seem to be arguing that enabling people to mimic Ghibli's art style somehow harms them, I don't see how it does at all. People have been able to mimic it already, what's the difference? More people can? Does that make a difference to ghibli? I mean can you point to some concrete negative effects that his phenomenon has had on studio ghibli?

I don't think you can. And I think that proves my point. Anything can be mimicked. People can play covers of songs, paint their own versions of famous paintings, copy Louis Vuitton bag designs, whatever they want. The effort it takes is irrelevant.

You don't even have to train AI on studio Ghibli's art to mimic it. You could just train it on other stuff and then the user could feed it studio ghibli art and tell it to mimic it. The specific training data itself is irrelevant, it's the volume of data that trains the models. Even if they specifically avoided training on studio Ghibli's art there would likely be basically no difference. It wouldn't be worth paying them for it.

freejazz · 2025-11-06T18:11:28 1762452688

You ever see warning labels on products? That's because putting them in terms and conditions isn't enough to avoid liability in a products liability case.

sfn42 · 2025-11-06T18:17:55 1762453075

Well it certainly helps

freejazz · 2025-11-06T21:55:18 1762466118

Who told you that? ChatGPT?

No, not really. It does not help. It evidences that they knew the risks were present but did not take the actions to adequately warn the consumers of its products, which OpenAI knows do not read the entirety of the TOS.

Again, this is exactly why you see warning labels on products. Prohibiting certain uses in small text, hidden in the TOS, is not a warning.

snoman · 2025-11-06T16:51:05 1762447865

Quite the over-simplified straw man you have there.

sfn42 · 2025-11-06T18:17:29 1762453049

Quite the lack of substance you have there.

Etherlord87 · 2025-11-06T18:32:10 1762453930

People publish stuff on the Internet for various reasons. Sometimes they want to share information, but if the information is shared, they want to be *attributed* as the authors.

> If you didn't want others to read your information you shouldn't have published it on the internet. That's all they're doing at the end of the day, reading it. They're not publishing it as their own, they just used publicly available data to train a model.

There is some nuance here that you fail to notice or you pretend you don't see it :D I can't copy-paste a computer program and resell it without a license. I can't say "Oh I've just read the bits, learned from it and based on this knowledge I created my own computer program that looks exactly the same except the author name is different in the »About...« section" - clearly, some reason has to be used to differentiate reading-learning-creating from simply copying...

What if instead of copy-pasting the digital code, you print it onto a film, pass the light through the film onto ants, make the light kill the ants exposed, and the rest of the ants eventually go away, and now use the dead ants as another film to somehow convert that back to digital data. You can now argue you didn't copy, you taught ants, the ants learned, and they created a new program. But you will fool no one. AI models don't actually learn, they are a different way the data is stored. I think when court decides if a use is fair and transformative enough, it investigates how much effort was put into this transformation: there was a lot of effort put into creating the AI, but once it was created, the effort put into any single work is nearly null, just the electricity, bandwidth, storage.

sfn42 · 2025-11-06T18:51:34 1762455094

> There is some nuance here that you fail to notice or you pretend you don't see it :D

I could say the same for you:

> I can't copy-paste a computer program and resell it without a license. I can't say "Oh I've just read the bits, learned from it and based on this knowledge I created my own computer program that looks exactly the same except the author name is different in the »About...« section"

Nobody uses LLMs to copy others' code. Nobody wants a carbon-copy of someone else's software, if that's what they wanted they would have used those people's software. I mean maybe someone does but that's not the point of LLMs and it's not why people use them.

I use LLMs to write code for me some times. I am quite sure that nobody in history has ever written that code. It's not copied from anyone, it's written specifically to solve the given task. I'm sure it's similar to a lot of code out there, I mean it's not often we write truly novel stuff. But there's nothing wrong with that. Most websites are pretty similar. Most apps are pretty similar. Developers all over the world write the same-ish code every day.

And if you don't want anyone to copy your precious code then don't publish it. That's the most ironic thing about all this - you put your code on the internet for everyone to see and then you make a big deal about the possibility of an LLM copying it as a response to a prompt?

Bro if I wanted your code I could go to your public github repo and actually copy it, I don't need an LLM to do that for me. Don't publish it if you're so worried about being copied.