I'm not really sure what I think about this. How responsible should Microsoft be for someone's badly licensed code on their platform? If they somehow had the ability to ban projects using stolen snippets of code, I don't think I'd dare to host my hobby projects there.
If you can't trust that the code in a project is compatible with the license of the project then the only option I see is that copilot cannot exist.
I love free software and whatnot, but I have a feeling this situation would've been quite different if copilot was made by the free software community and accidentally trained on some non free code..
> I'm not really sure what I think about this. How responsible should Microsoft be for someone's badly licensed code on their platform?
That's a really hard undersell of responsibility on the part of Microsoft/Github.
It seems as though they did approximately zero work to verify any of the code wasn't infringing. Things they could have tried but apparently didn't:
1) Ask developers to opt-in to copilot scanning of their repositories, and alongside that have them certify that they hold copyright over all lines of code included in the repository.
2) Use a training dataset of only public repositories listed under applicable pre-identified licensing schemes, from established groups. e.g.: *bsd licensed code from the various BSD OSes.
3) Sought out examples from standard libraries in other programming languages with suitable licenses.
It seems like they did nothing and just hoped. I can't see how anyone would try to rely on this thing in a commercial context after its proven to do this over and over. The well has been poisoned.
> I love free software and whatnot, but I have a feeling this situation would've been quite different if copilot was made by the free software community and accidentally trained on some non free code..
Precisely. Would it be okay for me to publish some code as GPL because my buddy gave it to me and promised that it was totally legit and I could use it and it definitely wasn't copy-pasted from one of the Windows source leaks?
> If you can't trust that the code in a project is compatible with the license of the project then the only option I see is that copilot cannot exist.
It might be possible to feed it only manually-vetted inputs, but yes; as it currently is, Copilot appears to be little but a massive copyright-infringement engine.
> Precisely. Would it be okay for me to publish some code as GPL because my buddy gave it to me and promised that it was totally legit and I could use it and it definitely wasn't copy-pasted from one of the Windows source leaks?
But where do you draw the line? What if you accidentally came up with the same or similar solution to something in windows? The code might not be from your friend either, it could be from N steps of copy paste, rework, reformating, refactoring, etc.
> But where do you draw the line? What if you accidentally came up with the same or similar solution to something in windows?
Yes, I agree that it's unclear how to deal with that in the general case at scale. Although cases like OP make me think that we could maybe worry about the grey area after we've dealt with the blatant copies.
> The code might not be from your friend either, it could be from N steps of copy paste, rework, reformating, refactoring, etc.
Well, my personal tendency would be to apply the same standard to Microsoft that they would apply to us. How many steps of removal is needed to copy MS proprietary code and it be okay?
> Yes, I agree that it's unclear how to deal with that in the general case at scale. Although cases like OP make me think that we could maybe worry about the grey area after we've dealt with the blatant copies.
The way I see copilot's output is that it's already in the grey zone. As with other models like this there are no snippets in the model. I can for example generate similar looking code to the cs_transpose function in Lua if I nudge it a bit. To me this seems equivalent of someone remembering exactly how a function works (to some extent..) and being able to write it in whatever language without copy pasting.
So the output as far as I understand is very grey. Maybe there's something in the training part that can be discussed, but as I mentioned earlier I'm not sure what else you can do other than check the license of some code or avoid creating copilot in the first place.
> How responsible should Microsoft be for someone's badly licensed code on their platform?
That's actually a very real problem that mega money has been spent on. The same legal problem appears on sites like YouTube around fair use and copyright. In terms of fair use that doesn't apply here see:
Regardless platforms are partially responsible for the content that their users upload into them. Most try to absolve themselves of this responsibility with their terms of service but legally that's just not possible.
Personally I'm an advocate for fair use but I'm also an advocate for strong copyright laws and their enforcement. In the short time the internet has been available to most people in the world there is a habit of stealing others work and claiming it as your own. Quite often this is for some financial gain.