Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Which part?


The part that says you shouldn't take outputs from their models to build datasets for training competitor models.

Outputs from models that they trained on stolen ebooks, unpaid reddit data, data scraped from millions of websites without credit, etc. Sort of like stealing a bike and then getting mad that it got stolen again later, because it was clearly rightfully yours.

https://i.pinimg.com/originals/d7/72/22/d77222df469b50e3b4cd...


I get your point but your analogy doesn’t quite work.


Yeah it's more like stealing a million bikes, putting all parts into a pile and custom assembling them on request.


Still not exactly right. Stealing bikes deprives owners of them, while scraping data doesn’t.


How about torrenting the entirety of the world's filmography, using that content to make clips compilations on youtube, then claiming copyright strikes and demonetizing videos that contain those clips?

In a sense, it's almost patent trolling.


For anyone wondering it's here https://openai.com/policies/terms-of-use:

>use output from the Services to develop models that compete with OpenAI;

Well, I still can use ChatGPT labeling for many other purposes anyway.


There’s some room for interpretation here. Are small sentiment analysis models competing with a large general purpose generative model? OpenAI doesn’t provide the former.

I see competing models as those of LLaMa, Falcon, etc. which would fall into the terms in my interpretation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: