> have a lot of open source code that I love to share for things like education or private stuff, but if you want to use it for something real, you need to hire me. If you can suck all the code without even I noticing it, that's not fair
Co-pilot aside, that's already how it works today. If you make something open source, I can use your code to power my business, and I'm under no obligation to hire you. It's great when companies give back to open source, either by supporting the projects they depend on, or by open sourcing their own internal projects, but it's not obligatory.
If you don't want people to independently profit from your code, don't release it under a license that allows commercial use
It sounds like the person you're responding to already releases their code under a non-commercial license. The problem with Copilot is that it may allow commercial enterprises to avoid such a license by copying the code verbatim from their repositories, possibly without any party involved knowing that it's happened.
But this is only for snippets right? Which I think is the issue: it has never been tested in court. Basically if you put:
/* web user management */
And copilot comes up with complete user management lifted out of another repo with all pages, db structures and logic but the copyrights stripped then yes. But, as I understand it, that is not what it does. You will need to slowly tell it every tiny part of how user mamagement is to be implemented and for those snippets it copies code. But when you are done, there might be snippets from 100s of different repositories potentially. I think it is hard to show that breaks copyright as many people already come up with roughly the same stuff 1000s times/day all over the world.
> But when you are done, there might be snippets from 100s of different repositories potentially.
Potentially, but not necessarily. It's possible also that if there is only one close match for the logic required, it may produce verbatim something it's already seen. GPT is known to do this for sufficiently precise inputs.
> I have a lot of open source code that I love to share for things like education or private stuff, but if you want to use it for something real, you need to hire me.
implies that they have code which they are sharing under that proviso. Do you read it differently?
You are right about the technical distinction of open source from source-available. I think that the GGP (and myself) were both using it colloquially as a shorthand for source-available.
Is Copilot trained on source available code? If not, then whatever restrictions you may want to apply with your source available code isn’t relevant. The debate is about copyleft.
GPL is an open source license. Please read the ancestors to understand what’s being discussed here.
Edit: I was clearly making a distinction between open source (as in covered by an OSI-approved open source license) and only source-available, rather than treating source-available as a superset of open source.
"Open source doesn't just mean access to the source code. The distribution terms of open-source software must comply with the following criteria: ... The license must not restrict anyone from making use of the program in a specific field of endeavor. For example, it may not restrict the program from being used in a business, or from being used for genetic research."
If there's no license, then it's not open source. This is a term with a standard meaning, and it doesn't just mean that the source is available for reading
Still, if you come across some published source code that does not appear to be licensed and does not specifically define itself as being "Open Source" as defined by the "Open Source Initiative", copyright law applies and you're not allowed to just take it and use it.
GitHub specifically uses the words "source code from publicly available sources" when talking about what they used to train their model on.
As far as I'm aware public code repos aren't by default "Open Source" as defined by the "Open Source Initiative".
Sorry, I was specifically responding to how my parent views open source.
I agree that Copilot was probably trained in part on public code that isn't open source -- GitHub's claim (not saying I agree) is that they don't need a license to train on code.
In this post's comment section alone many use the term "open source" but really mean to say "public-source", others use the two terms interchangeably even when they seem to be aware of the distinction, and then there are people who seem to think that by making your GitHub repo public it becomes OSD-spec "open source" and with that free to use.
It's just so confusing and easy to misinterpret each-others' true meaning.
Thanks for making me aware of the existence of that OSD OSS spec btw! Came across the (recovered) blog post where the term was first announced http://www.catb.org/~esr/open-source.html.
You have said this multiple times on this thread now, but in addition to how the OSI getting to unilaterally define the technical definition of "open source" being controversial even within the software engineering community, you really need to be looking at the definition of words "descriptively" and most people seem to put even "shared source" (look but don't touch) models as subsets of the class "open source".
Regardless: Copilot doesn't only pick up "open source" code... it picks up any code that has been published under any reason including the large amounts of code that is on GitHub without any license at all or which was literally stolen and leaked onto GitHub.
Meanwhile, even open source licenses have restrictions, whether they be "you can't use my work without agreeing to contribute back your work to the collective", various forms of automatic patent grants and associated retaliation clauses, or merely "you have to credit me", a simple limitation almost all open source software comes with which Copilot launders away.
> the OSI getting to unilaterally define the technical definition of "open source" being controversial even within the software engineering community
I don't think this is controversial? The OSI defined the term when they introduced it, in the late 90s. When Microsoft came out with "shared source" there was a huge amount of pushback from people saying "don't think that this is open source" (ex: https://www.linuxjournal.com/article/5496)
> Copilot doesn't only pick up "open source" code... it picks up any code that has been published under any reason
I agree. I'm not defending Copilot, and I think the legal questions here are interesting and tricky. My pushback here and throughout this page has been when people say non-commercial licenses are open source -- this thread started with kuon saying "I have a lot of open source code that I love to share for things like education or private stuff, but if you want to use it for something real, you need to hire me"
> it picks up any code that has been published under any reason including the large amounts of code that is on GitHub without any license at all or which was literally stolen and leaked onto GitHub.
You are stating a rather arbitrary assumption of yours as a fact, unless you have concrete sources or evidence.
This is the gist of it. I do not agree with OSI definition of open source but I won't argue about it here.
There are many OSI approved licenses with restrictions on use, like "your software must also be open source" or "contribute back your changes" or "give me attribution"...
> My definition of "open source" is that the source code is publicly available. That's it.
If I defined "open source" to mean that changes the source code must be released publicly, it's going to be pretty hard for me to talk to all the other people who already use "open source" to mean something else. There is already a standard term for the source code being publicly available: "source available": https://en.wikipedia.org/wiki/Source-available_software. You're also welcome to invent and attempt to popularize any alternative term you want, but using idiosyncratic definitions makes discussion less clear.
> Just because that organization got their hands on a premium domain name doesn't mean they get to decide what that term means.
The OSI didn't just claim "opensource.org" -- the folks behind it coined (https://opensource.com/article/18/2/coining-term-open-source...), introduced, and popularized the term "open source" over two decades ago. From the beginning they have used the same definition, which was derived from the Debian project's Free Software Guidelines.
They are not also not the only ones who use the term that way. Wikipedia has "Licenses which only permit non-commercial redistribution or modification of the source code for personal use only are generally not considered as open-source licenses" -- https://en.wikipedia.org/wiki/Open_source
Co-pilot aside, that's already how it works today. If you make something open source, I can use your code to power my business, and I'm under no obligation to hire you. It's great when companies give back to open source, either by supporting the projects they depend on, or by open sourcing their own internal projects, but it's not obligatory.
If you don't want people to independently profit from your code, don't release it under a license that allows commercial use