There at least two documented cases of the major AI companies downloading millions of books of torrents. Anthropic is in litigation about it right now, meta was in the news about it. I would be surprised if it's not all of them.
Hence, the "usually". Poisoning your sources with potentially illegally acquired content is a separate problem from the legal status of the compiled system's output. I mean, if you steal the book you use as an inspiration for your own book, would the author or bookseller of the stolen book then have any rights to your work? This is a fundamental problem, not one where the specific fails of companies matter.
There at least two documented cases of the major AI companies downloading millions of books of torrents. Anthropic is in litigation about it right now, meta was in the news about it. I would be surprised if it's not all of them.