It's a good example of how ridiculous the AI training situation is.
They claim it's fair use for them to steal all data they want, but you're not allowed to use AI data output, despite this data literally not being subject to copyright protections on account of lacking a human author.
And especially Github. They already have an enormous corpus that is licensed under MIT/equivalent licenses, explicitly permitting them to do this AI nonsense. All they had to do was use only the code they were allowed to use, maybe put up an attribution page listing all the repos used, and nobody would've minded because of the explicit opt-in given ahead of time.
But no. They couldn't bother with even that little respect.
I wonder how GPLv3 and CC BY SA licenses should be considered when training AIs like this? The model is software, and if it's sufficiently different from the source, it's a derivative work, isn't it?
They claim it's fair use for them to steal all data they want, but you're not allowed to use AI data output, despite this data literally not being subject to copyright protections on account of lacking a human author.
And especially Github. They already have an enormous corpus that is licensed under MIT/equivalent licenses, explicitly permitting them to do this AI nonsense. All they had to do was use only the code they were allowed to use, maybe put up an attribution page listing all the repos used, and nobody would've minded because of the explicit opt-in given ahead of time.
But no. They couldn't bother with even that little respect.