Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Can you compress the internet including copyrighted material and then sell access to it?

At what percentage of lossy compression it becomes infringement?




> Can you compress the internet including copyrighted material and then sell access to it?

Define access?

If you mean sending out the compressed copy, generally no. For things people normally call compression.

If you want to run a search engine, then you should be fine.

> At what percentage of lossy compression it becomes infringement?

It would have to be very very lossy.

But some AI stuff is. For example there are image models with fewer parameters than source images. Those are, by and large, not able to store enough data to infringe with. (Copying can creep in with images that have multiple versions, but that's a small sliver of the data.)


Commercial audio generation models were caught reproducing parts of copyrighted music in a distorted and low-quality form. This is not "learning", just "imitating".

Also, as I understand they didn't even buy the CDs with music for training; they got it somewhere else. Why do organizations that prosecute people for downloading a movie do not want to look if it is ok to make a business on illegal copies of copyrighted works?


I said "some" for a reason.


When you identify where the infringing party has stored the source material in their artifact.{zip,pdf,safetensor,connectome,etc}. In ML, this discovery stage is called "mechanistic interpretability", and in humans it's called "illegal."


It's not that clear cut. Since they're talking about taking lossy compression to the limit, there are ways to go so lossy that you're not longer infringing even if you can point exactly at where it's stored.

Like cliff's notes.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: