NYT will lose: Copyright only protects the actual text. LLMs have weights, not e...

kopecs · 2025-02-05T20:11:38 1738786298

> Copyright only protects the actual text. LLMs have weights, not exact copies.

Following this logic a lossily compressed image is completely unprotected by copyright.

> In any case, saying "if I put in some input and get copyrighted output" is tantamount to copyright violations; if I use a generative tool and generate copyrighted info is it the tools fault?

Do you not think this is obviously fact-specific? If I gzip a bunch of (copyrighted) files, then obviously that doesn't somehow make distributing them not infringement. If I now replace the tool = ungzip + input = files combination with tool = (ungzip and files) and input = (selection mechanism over files) do you think that in the second case distributing the tool is not infringement? I don't mean to say that any of these is precisely the same as the LLM case, but I think your argument is clearly overbroad.

> OpenAI at most broke an EULA or some technicality on copyright w.r.t. local ephemeral copies. What's the damage to the NYT though?

One obvious damage claim (if you are skeptical of market harm wrt newspaper/oneline sub sales) is that they were entitled to the FMV of licensing costs of the articles, which is not so hard to value: OpenAI has entered such agreements with AP and others. [0]

[0]: https://apnews.com/article/openai-chatgpt-associated-press-a...

dkjaudyeqooe · 2025-02-05T20:11:42 1738786302

Wrong. I can sample a sound off a record, convert it to any format, manipulate it until it's unrecognizable and I'll still have to pay royalties to the original copyright holder.

Even a translation of original text into another language is copyright infringement.

The real question is if LLMs are fair use, and on the basis of the standard tests for fair use, it seems quite doubtful.

dragonwriter · 2025-02-05T20:21:24 1738786884

> Copyright only protects the actual text.

Copyright protects against both derived works and copies in any form, including lossy or inaccurate copies that do not reach the originality level to be derived works, not just “exact copies”.

But that doesn't really matter, here, because OpenAI isn't being sued for producing and distributing an LLM (against a mere LLM distributor, NYT would have a much weaker case), they are being sued for providing a service which takes in copyrighted works and spits out copies, both exact and not, that are well within the established coverage of what is a copyright violation that does not fall within exceptions like fair use. and when they control the whole path in between original and copy, then the path in between is largely immaterial.

Its not an “is training AI on copyright protected works fair use” case, its an “is producing copies well within the established parameters of commercial copright violation rendered fair use by sticking an LLM in the middle of the process as part of the mechanism of copying” case.

otterley · 2025-02-05T20:02:35 1738785755

To train the model, OpenAI had to make a copy of NYT's works in order to do it. (Running a scraper to dump websites onto your local storage is making a copy.) NYT's first theory is that the act of copying is a prima facie copyright violation.