Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Microsoft should just train it on all their proprietary code instead. See how sanguine they are about it then.


They avoided answering this question at all costs.

Because it exposes their direct hypocrisy in this, its fair use for OSS but not for us.

Questions here are very important, and its no surprise GitHub avoided answering anything about CoPilot's legality:

https://sfconservancy.org/GiveUpGitHub/


who said they haven't.

for something to show up verbatim in the output of a textual AI model it needs to be an input many times.

I wonder if the problem is not copilot, but many people using this person's code without license or credit, and copilot being trained on those pieces of code as well. copilot may just be exposing a problem rather than creating one.

I don't know much about AI, and I don't use copilot.


Microsoft have a public statement that they don't use proprietary code, only public code with public licenses. They have a lot of companies as customers who uses github, and they also use a lot third-party code in their own products.


Even BSD et. al. have attribution requirements - that must be a vanishingly small amount of code to be used. Me thinks the people who run GitHub (who have apparently decided to abandon the core business for the latest fun project) aren't being entirely upfront.


I thought they said all public repos without regard to the license they are under, which could be a proprietary EULA.


With the amount of resources that Microsoft has, how hard can it be for them to exclude proprietary code that other people have stolen? I’d bet it is easy for them, but they won’t do it. Because they don’t care, because who is gonna take on them?

Will they “accidentally” include proprietary code from say, Oracle? Nope. They’ll make sure of it. But Joe Random? Sure


there's exactly no way they have


I'm curious how you could possibly know that for sure.


because Microsoft is known to be extremely protective of their code. there is just no way they would expose their internal code to being straight up decoded from the model, while they can just train the model on the huge public data of GitHub


As a thought experiment: what do we all suppose would be the impact to Microsoft if they deliberately made public the proprietary source code for all of their publicly available commercial products and efforts (including licensed software, services; excluding private contracts, research), but the rest of their intellectual property and trade secrets remained private?

Since I’m posing the question, here’s my guess:

- Their stock would take at least a short term hit because it’s an unconventional and uncharacteristic move

- The code would reveal more about their strategic interests to competitors than they’d like, but probably nothing revelatory

- It might confirm or reinforce some negative perceptions of their business practices

- It might dispel some too

- It may reduce some competitive advantage amongst enormous businesses, and may elevate some very large firms to potential competitors

- It would provide little to no new advantage to smaller players who aren’t already in reach of competing with them and/or don’t have the resources to capitalize on access to the code

- It would probably significantly improve public perception of the company and its future intentions, at least among developers and the broader tech community

In other words, a wash. Overall business impact would be roughly neutral. The code has more strategic than technical value, there are few who could leverage the technical value that is any kind of revenue center with growth potential. Any disadvantage would be negated by the public image goodwill it generated.

Maybe my take is naive though! Maybe it would really hurt Microsoft long term if suddenly everyone can fork Windows 11, or steal ideas for their idiosyncratic office suite, or get really clever about how to get funded to go head to head with Azure armed with code everyone else can access too.


I think Microsoft, their ISVs, and everyone would benefit a lot if Windows were "open source" in the narrow sense - viewable source code, with a license to compile and use only to the extent that you already own the requisite Windows license(s).

Pirating Windows is already utterly trivial with KMS activation, so it's not like they'd lose anything there.


Some day, I think the Windows source will be public, at least for reference purposes.

They already have one open source part I know of, the new conhost[0].

[0] https://github.com/microsoft/terminal


If they’d open source their software I wouldn’t have to wait two months till they finally release the pdbs for the kernel after every 2XH1 / 2XH2 update.

It’s so annoying that they are sooooo slow at this and we have to keep our users from upgrading after every release.


If they released all the source, I'd be able to run the nice drawing app from windows inkspaces again, unkilling the app they want dead


Maybe open-source licenses might need to be revised to disallow this sort of thing, e.g. by saying that any thing trained on GNU data must also carry that license.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: