Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

First sentence of second paragraph of the lawsuit: “Defendants’ unlawful use of The Times’s work to create artificial intelligence products that compete with it threatens The Times’s ability to provide that service.” First sentence of p7: “The Times objected after it discovered that Defendants were using Times content without permission to develop their models and tools.”

I think it’s ultimately about whether training on copyrighted content is legal or not.

Here are some other quotes from the lawsuit that approach it from a different angle: “These tools also wrongly attribute false information to The Times.” “By providing Times content without The Times’s permission or authorization, Defendants’ tools undermine and damage The Times’s relationship with its readers and deprive The Times of subscription, licensing, advertising, and affiliate revenue.”

Even if the first argument fails, if the second argument wins, it still boils down to not being able to train on copyrighted content unless it is possible to train on copyrighted data without ultimately quoting that content or attributing anything to the author of that content. My (uneducated) guess is that’s not possible.



> I think it’s ultimately about whether training on copyrighted content is legal or not.

It is.

The bulk of the complaint is a narrative; it's meant to be a persuasive story that seeks to put OpenAI in a bad light. You don't really get to the specific causes of action until page 60 (paragraphs 158-180). A sample of the specific allegations that comprise the elements of each cause of action are:

160. By building training datasets containing millions of copies of Times Works, including by scraping copyrighted Times Works from The Times’s websites and reproducing such works from third-party datasets, the OpenAI Defendants have directly infringed The Times’s exclusive rights in its copyrighted works.

161. By storing, processing, and reproducing the training datasets containing millions of copies of Times Works to train the GPT models on Microsoft’s supercomputing platform, Microsoft and the OpenAI Defendants have jointly directly infringed The Times’s exclusive rights in its copyrighted works.

162. On information and belief, by storing, processing, and reproducing the GPT models trained on Times Works, which GPT models themselves have memorized, on Microsoft’s supercomputing platform, Microsoft and the OpenAI Defendants have jointly directly infringed The Times’s exclusive rights in its copyrighted works.

163. By disseminating generative output containing copies and derivatives of Times Works through the ChatGPT offerings, the OpenAI Defendants have directly infringed The Times’s exclusive rights in its copyrighted works.


IMO the first argument is invalid, however, the second one is a completely valid argument.


> "Defendants’ tools undermine and damage The Times’s relationship with its readers and deprive The Times of subscription, licensing, advertising, and affiliate revenue."

News flash: you can read newspaper articles at the library.


Yes, and libraries pay for that access. They also don't obfuscate the origin or remove the advertising. Don't equate libraries with what OpenAI does.


> News flash: you can read newspaper articles at the library.

Reading an article != selling a product that redistributes the article.


And its no coincidence that the NYTimes isn't suing OpenAI for reading newspaper articles at the library...




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: