Which part?

moffkalast · on June 20, 2023

The part that says you shouldn't take outputs from their models to build datasets for training competitor models.

Outputs from models that they trained on stolen ebooks, unpaid reddit data, data scraped from millions of websites without credit, etc. Sort of like stealing a bike and then getting mad that it got stolen again later, because it was clearly rightfully yours.

https://i.pinimg.com/originals/d7/72/22/d77222df469b50e3b4cd...

chillbill · on June 20, 2023

I get your point but your analogy doesn’t quite work.

nickstinemates · on June 20, 2023

Yeah it's more like stealing a million bikes, putting all parts into a pile and custom assembling them on request.

chillbill · on June 22, 2023

Still not exactly right. Stealing bikes deprives owners of them, while scraping data doesn’t.

moffkalast · on June 22, 2023

How about torrenting the entirety of the world's filmography, using that content to make clips compilations on youtube, then claiming copyright strikes and demonetizing videos that contain those clips?

In a sense, it's almost patent trolling.

iillexial · on June 20, 2023

For anyone wondering it's here https://openai.com/policies/terms-of-use:

>use output from the Services to develop models that compete with OpenAI;

Well, I still can use ChatGPT labeling for many other purposes anyway.

binarymax · on June 20, 2023

There’s some room for interpretation here. Are small sentiment analysis models competing with a large general purpose generative model? OpenAI doesn’t provide the former.

I see competing models as those of LLaMa, Falcon, etc. which would fall into the terms in my interpretation.