First sentence of second paragraph of the lawsuit: “Defendants’ unlawful use of The Times’s work to create artificial intelligence products that compete with it threatens The Times’s ability to provide that service.” First sentence of p7: “The Times objected after it discovered that Defendants were using Times content without permission to develop their models and tools.”
I think it’s ultimately about whether training on copyrighted content is legal or not.
Here are some other quotes from the lawsuit that approach it from a different angle: “These tools also wrongly attribute false information to The Times.” “By providing Times content without The Times’s permission or authorization, Defendants’ tools undermine and damage The Times’s relationship with its readers and deprive The Times of subscription, licensing, advertising, and affiliate revenue.”
Even if the first argument fails, if the second argument wins, it still boils down to not being able to train on copyrighted content unless it is possible to train on copyrighted data without ultimately quoting that content or attributing anything to the author of that content. My (uneducated) guess is that’s not possible.
> I think it’s ultimately about whether training on copyrighted content is legal or not.
It is.
The bulk of the complaint is a narrative; it's meant to be a persuasive story that seeks to put OpenAI in a bad light. You don't really get to the specific causes of action until page 60 (paragraphs 158-180). A sample of the specific allegations that comprise the elements of each cause of action are:
160. By building training datasets containing millions of copies of Times Works,
including by scraping copyrighted Times Works from The Times’s websites and reproducing such works from third-party datasets, the OpenAI Defendants have directly infringed The Times’s exclusive rights in its copyrighted works.
161. By storing, processing, and reproducing the training datasets containing millions of copies of Times Works to train the GPT models on Microsoft’s supercomputing platform, Microsoft and the OpenAI Defendants have jointly directly infringed The Times’s exclusive rights in its copyrighted works.
162. On information and belief, by storing, processing, and reproducing the GPT models trained on Times Works, which GPT models themselves have memorized, on Microsoft’s supercomputing platform, Microsoft and the OpenAI Defendants have jointly directly infringed The Times’s exclusive rights in its copyrighted works.
163. By disseminating generative output containing copies and derivatives of Times Works through the ChatGPT offerings, the OpenAI Defendants have directly infringed The Times’s exclusive rights in its copyrighted works.
> "Defendants’ tools undermine and damage The Times’s relationship with its readers and deprive The Times of subscription, licensing, advertising, and affiliate revenue."
News flash: you can read newspaper articles at the library.
I think it’s ultimately about whether training on copyrighted content is legal or not.
Here are some other quotes from the lawsuit that approach it from a different angle: “These tools also wrongly attribute false information to The Times.” “By providing Times content without The Times’s permission or authorization, Defendants’ tools undermine and damage The Times’s relationship with its readers and deprive The Times of subscription, licensing, advertising, and affiliate revenue.”
Even if the first argument fails, if the second argument wins, it still boils down to not being able to train on copyrighted content unless it is possible to train on copyrighted data without ultimately quoting that content or attributing anything to the author of that content. My (uneducated) guess is that’s not possible.