Hacker News new | past | comments | ask | show | jobs | submit login

Presumably, because they trained Copilot on billions of lines of, often licensed, code (without permission), that Copilot has a tendency to regurgitate verbatim, without said license.



For a specific example some variation of "fast inverse square root" will usually get you the exact GPL licensed code from Quake III, comments included.


Do you mean the same code that has its own Wikipedia page where the exact code is written, comments included, and has probably been copy pasted into 100’s of other projects?

https://en.m.wikipedia.org/wiki/Fast_inverse_square_root


You mean this code?

https://archive.softwareheritage.org/browse/content/sha1_git...

Do you see that notice at the top of the file? It says:

==

This file is part of Quake III Arena source code.

Quake III Arena source code is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

===

but because it's been laundered by Microsoft, you think it's okay to steal free software and make it proprietary?


How is it made proprietary? The Quake III Arena is no more proprietary now then if it were stored on GitHub proprietary web servers. Copilot is just a fancy code index, that sometimes returns the original code and other times it gives you a modified copy.


Because as you say, it provides original or modified code but doesn't provide provenance or license information. It's copyright laundering. After decades of fighting the community in the courts over shit like this, Microsoft just turns around and says well, it's okay when we do it? Foh.


The problem is you have to obey the license of the code even if you just take a snippet and Copilot does not reproduce the correct license.


"The algorithm was often misattributed to John Carmack, but in fact the code is based on an unpublished paper by William Kahan and K.C. Ng circulated in May 1986"


That code didn't originate from quake


The point is that it's charging having been trained on open source code. What you're saying agrees with that, but your triumphant tone seems to be implying the opposite. Which did you mean?


Yes that code, I was replying to a comment claiming that

> Copilot has a tendency to regurgitate [code] verbatim, without said license.

and I think that is a pretty good example.


> that Copilot has a tendency to regurgitate verbatim, without said license.

A "tendency" is overstating it. I'm not aware of any example that would have been likely to occur if the author wasn't specifically trying to get the regurgitated code.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: