Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think it's best if we sidestep these big conceptual questions about what cognition or creativity really are. It's hard to find agreement, and perhaps it is not necessary to do so.

My position is that if a person hired in a company can currently use Google, Stack Overflow and GitHub to help develop their custom scripts, and no moral or copyright issues are infringed (ie, you don't try to say you came up with it on your own, and you use only enough that it is clearly fair use), then I think an AI should be able to assist in that task. There is no need to complicate things by legislating what the AI is doing and what Google is doing, as they are very similar things and in fact even use similar methods.



I would agree with you if the AI was genuinely assisting with that task, but it isn't.

It's taking inputs, ignoring their licenses, permuting them in ways that are not understandable to the user, and then outputting them.

That's an entirely different task than the user reading SO or using Google and then writing their own code, because the "AI" is not capable of writing its own code at that level.

Relying on this tool means ignoring the license of code that you're copying, without even knowing that you're doing it.


> That's an entirely different task than the user reading SO or using Google and then writing their own code, because the "AI" is not capable of writing its own code at that level.

I would say it's a very similar task. If I need to remember how to use a certain function, I can Google for documentation and examples, or I can tell Copilot what I want to do. The fact that the solution was presented by Copilot or a SO thread is, in my view, irrelevant. And to compound on that, I doubt anyone checking SO truly knows where that answer came from. The person could simply be reproducing a snippet from somebody else, you have no way of knowing if it was licensed.

I don't think this is bad either. Even our current shitty copyright laws protect that kind of use. I shouldn't have to worry whether my little prime number generator uses an algorithm first created by John Carmack or Microsoft. Programming has evolved rapidly in great part because we can all use other people's work and use it to improve ours. Of course you shouldn't just copy and paste everything and call it a day, but that's hardly what Copilot enables anyway.


You really seem to be ignoring the core issue by focusing on SO though. Everything on SO is fair game, but code on GitHub is under a variety of licenses, and when Copilot regurgitates it, no matter how complex and inscrutable the process is that leads it to do so, it may be causing the user of Copilot to misuse that code because it doesn't even give them the opportunity to know where it came from or what license it was released to the public under.


Again, how does that differ from Stack Overflow? Do you go and check whether a given reply belongs to a licensed project?

Also, please consider that there is a toggle that allows you to block Copilot from using public code.


> Do you go and check whether a given reply belongs to a licensed project?

All SO questions, answers and comments are CC BY-SA. The terms of the site say that anyone submitting this content agrees that it's licensed that way, and when you visit the site you agree that you are provided with the content under that license. It's not necessary for you to check whether the submitter had the right to offer it under that license; that's their problem. The same goes for any content offered to you under a given license on any platform. I don't understand what your question has to do with the conversation.

The problem with Copilot, and I really can't believe this has to be restated over and over again, is that it takes code from projects with various licenses, and outputs it in your editor in various transformed-or-not-transformed ways (the fact that the transformation is extremely complex doesn't change anything), and gives you no way to know where the code came from, how it was licensed or how it has been transformed. So, despite the fact that if you use it enough you are virtually guaranteed to use code in contravention of its license, you cannot even know which projects you have stolen code from or which licenses' terms you are breaking.

> Also, please consider that there is a toggle that allows you to block Copilot from using public code.

Great. I'm sure its utility doesn't go down at all if you turn that toggle off...


> All SO questions, answers and comments are CC BY-SA. The terms of the site say that anyone submitting this content agrees that it's licensed that way, and when you visit the site you agree that you are provided with the content under that license.

Have you ever read GitHub's conditions to know whether they also have the right to use your code that way, no matter how you decide to license it? I feel that you are overly focused on the legal part here, which I'm sure was handled by Microsoft's lawyers. I'm more interested in the question of principle.

No matter what the terms of use at SO say, anyone can give you an answer that is a copy of some code they don't own. You may consider that immoral, but I don't, not at the scope SO is used for. In addition, the vast majority of cases at SO and Copilot are not about complex functions being stolen, it's about some dumb code you would have found in 2 minutes of googling. What I'm trying to argue here is that if we are all cool with SO and think it's useful, there is no fundamental difference here. We never cared too much about licenses for boilerplate code, and I think we shouldn't start now.


> Have you ever read GitHub's conditions to know whether they also have the right to use your code that way, no matter how you decide to license it? I feel that you are overly focused on the legal part here, which I'm sure was handled by Microsoft's lawyers. I'm more interested in the question of principle.

I have, and there is not. Neither could there be — in many cases the person uploading code to GitHub is not the copyright holder — they are just doing something permitted under the license — and for a large open source project there could be thousands of copyright holders. A random person mirroring some source code to GitHub is in no position to negotiate different license terms on behalf of the copyright holder(s).

> No matter what the terms of use at SO say, anyone can give you an answer that is a copy of some code they don't own. You may consider that immoral, but I don't, not at the scope SO is used for. In addition, the vast majority of cases at SO and Copilot are not about complex functions being stolen, it's about some dumb code you would have found in 2 minutes of googling. What I'm trying to argue here is that if we are all cool with SO and think it's useful, there is no fundamental difference here. We never cared too much about licenses for boilerplate code, and I think we shouldn't start now.

I don't understand why you think a person writing an answer on SO and a computer program outputting some permutation of its inputs into your editor are the same thing. The person writing an SO answer is intelligent and capable of conceptual understanding, the computer regurgitating code without regard to its license is not.


>> Have you ever read GitHub's conditions to know whether they also have the right to use your code that way, no matter how you decide to license it? > I have, and there is not.

At least one IP lawyer strongly disagrees, suggesting anything you host on GitHub is fair game [1].

[1] https://fossa.com/blog/analyzing-legal-implications-github-c...

> The person writing an SO answer is intelligent and capable of conceptual understanding, the computer regurgitating code without regard to its license is not.

From a copyright perspective, that is irrelevant. In fact I would think Copilot has more incentives to not infringe than a random SO user, who is very unlikely to be sued. I already argued in another post that in my view, from any perspective, it is also irrelevant whether it's a person or AI doing the same work Copilot does.


> At least one IP lawyer strongly disagrees, suggesting anything you host on GitHub is fair game [1].

The question is whether Copilot's users can use the regurgitated code without following the license terms, not whether Copilot was allowed to train their model on it. I agree it's likely fine for them to train the model, but the use of Copilot would seem to be a legal minefield.

A little thought makes it clear that an affirmative answer would be absurd. This would mean that using a simple tool (let's say `cat`) to make a copy of some code and subsequently ignoring its license terms is infringement, but if the software used to make the copy is more complex (or perhaps if it has the "AI" label stuck to it!) the same actions are not infringement.


If I make a script and train it on Windows source code do you think MS will like it if I use that script on Wine ? I am sure MS will say the license did not allows it and your script transformations are not original, so GPL or similar license should be respected by Microsoft too.

>My position is that if a person hired in a company can currently use Google, Stack Overflow and GitHub to help develop their custom scripts, and no moral or copyright issues are infringed (ie, you don't try to say you came up with it on your own, and you use only enough that it is clearly fair use),

Only a judge will determine if it is actually free use, if you by change copied some super clever and unique code into your code base then I am sure a judge will not say it is fair use, copilot was proven it will do this(though MS said they put some IF-ELSE checks in the AI to prevent the plagiarism to be detected by removing obvious results and maybe obfuscating stuff more).

Maybe Stack Overflow license allows you to copy paste the answers in your code, but GitHub code has repo specific license that you need to respect.

If MS trained the model on all their private repos too and made the model free software then many would not have this issues. Or keep the model proprietary and train it only on the MS repors, BSD and similar licensed repos.


You are saying that the AI should be treated the same way as a person would regarding its 'output'. I disagree. This is a conceptual disagreement and you cannot just sweep under the rug "what cognition or creativity really are".

At the end, when in several (2-5) years we start seeing structural unemployment emerging because of AI deployments, this will be resolved by the legal system, most likely by some sort of partial prohibition of training/monetizing such systems.


I think I still have not understood your argument. Are you saying that you are afraid that AIs will become too powerful and cause unemployment, and therefore we should regulate them now before they do so?

Many people are worried about this, which is why there is a lot of debate about minimum income programs. However, at present, what Copilot is doing is similar to what Google does, and it is certainly not going to replace devs any time soon. Personally, I think we should exploit technology to its fullest, and the only reason we can have this conversation is because in the past, we haven't given too much consideration about the mailmen, secretaries, delivery workers and everyone else who got displaced by our use of the internet and similar technologies. We merely adapted to better exploit them.


I am not saying (in that last comment) what should happen, I am saying what will happen. Past automation in terms of impact is nothing compared to what's coming and people and lawmakers will react accordingly - not in favor of the automators.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: