Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>Hmm, this is an interesting framing of the lawsuit.

First, it's not a "framing" of the lawsuit. A lawsuit is a number of claims made by one party against the other. In the two California cases, there were no decisions made on claims relating to LLM outputs. In the NYT case, there are claims relating to LLM outputs.

Yes, it could also be about training. But the discovery pertains to the outputs, which is the issue in this case. So even if you apply the holding that training is fair, which I don't see likely to happen in the district courts of the second circuit, you still don't get the result that the person I responded to suggested, which was that this should all be moot because of two decisions in two different cases in California which are not binding precedent in the 2nd circuit, and which also would not dispose of all of NYT's claims.

>So then this case should also be about training. The question then is: did OpenAI intend to have these models be able to regurgitate large amounts of content? Or is it yet another emergent property that nobody anticipated?

Intent is not a required element of copyright infringement, so you'd be wrong there. Plaintiffs can use intent to evidence willful infringement, which they are entitled to do in statutory damages cases, and receive a damages multiplier, which this one is. So OpenAI can't avoid liability based on their intent or a lack thereof. They can only, at best, use 'intent' to establish that NYT's is not entitled to heightened damages.

>So this might come down to intent.

It's always amusing to see people apply completely made up rationales to legal cases based upon their own personal feelings about technologies while completely disregarding, lets say, 100 years of legal jurisprudence.





Oh I'm totally an armchair lawyer, so my ruminations were not grounded in laws or legal precedence :-) I do have some background on the patent side of things, where independent reinvention is also not a defence for infringement, but not so much in copyright, so this was educational.

However, has there been any case where the infringment was not only unintentional, but also unexpected?

That is, if you look at cases of uintentional infringement, these are typically cases where some the act of reproduction of content was intentional, but there was a lack of awareness or confusion about the copyright protections of that content. (This paper was useful for background: https://www.law.uci.edu/faculty/full-time/reese/reese_innoce...)

But I could not find a case where the act of copying itself was non-intentional.

In this case, looking at how LLM training works and what LLMs do, it is surprising that it could reproduce the training content verbatim. The fact that it reproduced those outputs is undeniable, but how does existing law and jurisprudence apply to an unprecedented case like this where the reproduction was through some magic black box that nobody can decipher?


These are interesting questions but they are not legal questions. Intent is not an element of infringement. It is only an element of willful infringement. Therefor it can never be used as a defense against infringement on its own.

>The fact that it reproduced those outputs is undeniable, but how does existing law and jurisprudence apply to an unprecedented case like this where the reproduction was through some magic black box that nobody can decipher?

People love to ponder... but ponder how the law should handle that... "Yes, your honor, our business has a magical black box that violates the law, we're just not sure how! Therefore we can't be liable" -- How does that even make sense? On what principle should that apply here and not elsewhere? Can your magic black box murder? Defame?


> On what principle should that apply here and not elsewhere? Can your magic black box murder? Defame?

Good questions, and I think relevant to the current point. We're already seeing cases like that pop up with the libel suits or the recent, tragic AI-assisted suicides.

It's very clear that these models were not designed to be "suicide-ideation machines", yet that turned out to be one of the things they do! In these cases the questions are definitely not going to be about whether the AI labs intended these outcomes, but whether they took sufficient precautions to anticipate and prevent such outcomes.

One possible defense for the AI labs could be "these machines have an unprecedented, possibly unlimited, range of capabilities, and we could not reasonably have anticipated this."

A smoking gun would be an email or report outlining just such a threat that they dismissed (which may well exist, given what I hear about these labs' "move fast, break people" approach to safety.) But without that it seems like a reasonable defense.

While that argument may not work for this or other cases, I think it will pop up as these models do more and more unexpected things, and the courts will have to grapple with it eventually.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: