How the TensorFlow team handles open source support

jordigh · on May 7, 2017

I don't like Google's CLA. This article spins it to make it sound like the CLA is there to make sure the code can be used under the Apache license, but what a CLA really does is shift blame away from Google to all of the external contributors. The article even even expands the CLA acronym incorrectly, to make it sound like it's about the code instead of being about the contributor, and the rights they're giving up for Google.

CLAs are one-way, for covering Google's ass. They do not benefit the contributors.

http://ebb.org/bkuhn/blog/2014/06/09/do-not-need-cla.html

DannyBee · on May 7, 2017

"I don't like Google's CLA. "

This much is apparent :)

"This article spins it to make it sound like the CLA is there to make sure the code can be used under the Apache license"

Which it is.

", but what a CLA really does is shift blame away from Google to all of the external contributors."

Well, no, actually, it doesn't shift "blame" at all (and you aren't clear on "blame for what").

In fact, it's main goal is to protect the end users and the project.

"CLAs are one-way, for covering Google's ass. They do not benefit the contributors."

This first part is just flat out false. If you read even the first sentence of each line of the CLA, you'll see it benefits more than just Google (and it only benefits Google at all because they are the ones taking on the liability), and benefits Google exactly as much as the end-users. The second part is just silly, the goal of properly written CLA's is not to "benefit the contributors". What benefits do you think they need?

The goal of the CLA is to benefit the project, and make sure the project itself, and more importantly, the people who use the software don't get screwed, particularly if a contributor screws up.

So now i just think you read a blog post and think you agree with it, but don't actually understand the situation enough to articulate coherent arguments as to why.

"http://ebb.org/bkuhn/blog/2014/06/09/do-not-need-cla.html"

I respect Bradley a lot, but on this subject, he is simply wrong in this situation. This is not entirely surprising. Bradley does not deal with a lot of communities that involve attempting to maintain patent peace among various corporations. He mostly deals with communities whose issues are copyright related, and where most situations can be dealt peaceably with by removing code.

In those communities, i think it would be reasonable to not give a crap about CLA's and to think they were a waste of time.

However, all the world is not that simple.

jordigh · on May 7, 2017

I was not surprised to see that you're a lawyer at Google, so of course you think Google is doing the right thing.

As you know, what the CLA is doing is making sure Google can't get sued for patents or copyright claims on the software. That's what I meant by shifting the blame from Google to the contributors. Google can just say, look, we have this CLA here, so it means we didn't do it, go talk to the one who signed the CLA. Not our problem, says Google. So now it's up to the individual who signed the CLA to deal with whatever legal fallout happens, a person who probably has a lot less lawyers than Google to handle the problem.

If you really wanted reassurance that the code is being contributed properly, there are other documents that could be signed which would be much more two-way than Google's CLA. For example, GNU's copyright assignment includes a promise from GNU that they will always keep the software free, with GNU absorbing the legal burden to do so. KDE's CLA has a similar wording, talking about free software principles, and it's also optional for contributors.

Now I am going to be more adversarial and unpleasant. I don't like the Google hivemind. Google employees have a very uniform mode of thought on certain topics. This includes the beliefs that the GPL is dangerous and must never be used by Google unless forced to (i.e. only on software that came from outside of Google), that the AGPL is the ultimate evil, and that under all circumstances Google must be protected above everything. Google only legally cares about Google, no more. You can pretend like the CLA also benefits the other contributors, but that's not really the case. The external contributors could take Tensorflow, fork it away from Google, keep working without a CLA, and everything would be right. All that's necessary is the implicit agreement to the free license (for example, the clause in the GPL that says using the software grants you the rights provided by the GPL). Like you say, should a problem arise, all they have to do is remove the offending code.

DannyBee · on May 7, 2017

"I was not surprised to see that you're a lawyer at Google, so of course you think Google is doing the right thing."

Actually, it's more than that if you really want to get into it! I came up with Google's policies and CLA. But it's always great to attack people for what they do instead of actually engage on the merits!

"As you know, what the CLA is doing is making sure Google can't get sued for patents or copyright claims on the software."

This is false, so i don't know it. It's making sure the innocent end user can't get sued. Google can and has gotten sued for what contributors do. Surprise! In fact, in that case, the contributors were not even notified. We took care of it. Double surprise, apparently.

" So now it's up to the individual who signed the CLA to deal with whatever legal fallout happens, a person who probably has a lot less lawyers than Google to handle the problem."

So just to be clear if an individual deliberately screwed up, you think Google should handle the problem because they have more lawyers? That's an interesting notion. Hey, i fucked up my neighbors house, but he should worry about it, because he makes more money than me. Past that, you think the end users should pay the price? Because again, the issue you are complaining about is there to protect the end users, not Google.

As I already mentioned, Google already tends takes care of these issues on it's own.

"If you really wanted reassurance that the code is being contributed properly, there are other documents that could be signed which would be much more two-way than Google's CLA."

Not really.

" For example, GNU's copyright assignment includes a promise from GNU that they will always keep the software free, with GNU absorbing the legal burden to do so. KDE's CLA has a similar wording, talking about free software principles, and it's also optional for contributors."

Both of these, which i'm intimately familiar with, have precisely the same issue you complain about in the first paragraph. So again, your argument seems non-coherent.

Neither of these CLA's indemnify contributors or otherwise prevent them from being sued. In fact, they both do exactly the same as the Google CLA in this regard.

" This includes the beliefs that the GPL is dangerous and must never be used by Google unless forced to (i.e. only on software that came from outside of Google),"

This is also just false. You literally have no idea what you are on about. In fact, if you bothered to look farther, you'd see that i was a GCC maintainer and have contributed and used GPL software (both FSF-and non at Google for many years.

Google happily uses and contributes to tons of GPL software, and has no such policies or thoughts. Again, i'm the person who made the policies, so i would know. When I first joined one of the first things I did was get Google to sign a blanket copyright assignment with the FSF.

"that the AGPL is the ultimate evil, "

Whatever this is supposed to mean. We have several practical problems with the AGPL that other companies have as well. We don't avoid it for ideological reasons. Interestingly if you ask Eben , you'd find that he also has concerns with how it achieves its goals.

"You can pretend like the CLA also benefits the other contributors, but that's not really the case. "

I actually explicitly have said, numerous times, the CLA is mainly for the protection of end users and the project.

"Like you say, should a problem arise, all they have to do is remove the offending code."

This is where, like Bradley, you simply have no idea what you are talking about. That only works in very simple situations.

Overall, your argument comes off as "i have an axe to grind with Google". Nothing you have stated makes a lot of sense. The people you hold up on a pedestal have precisely the same legal issue you refer to, but you randomly change arguments about what exactly your issue is.

As you surely also know Google CLA is a copy of the Apache CLA with one of the obligations removed because we felt it was too onerous (there are no other changes). So your real problem is with the thousands of projects that use Apache CLAs.

jordigh · on May 8, 2017

So, if Google doesn't have a problem with the GPL or doesn't think it's dangerous, what's the software that originally came from Google that was GPL-licensed? Your gcc example doesn't count, because like I said, Google was coerced to use it for gcc. When did Google license something under the GPL because they wanted to defend their copyleft?

And yeah, the practical problem with the AGPL that you have is that you think your company will be destroyed if the world could see your source code. That's why you tell all of your employees that they can't even consider touching software that's AGPLed. So, what are you hiding there?

The only reason Google will use GPLed software is because they managed to de-fang it. Once the AGPL put the fangs back in, oh no, can't have that.

For Google, like for Apple, Microsoft and Facebook; free software is only intended for relatively unimportant scraps. The real meat, the real code, must remain secret, must remain safe. That's how you can control the users, how you can better target ads at them. Imagine the havoc if we could see the source code for AdSense or Gmail!

On the other hand, thank you for clarifying how the CLA works. No sarcasm here, I'm honestly thankful about that. Sorry, I was wrong about that.

tptacek · on May 8, 2017

I think most people in the ecosystem are happy that Google doesn't tend to use the GPL, since doing so would make it more difficult to use that code commercially.

In commerce, GPL'ing at the origin is mostly a tactic for protecting the commercial value of code; for instance, Sourcefire kept Snort under the GPL so that nobody could effectively compete with them using the Snort codebase, since they had the asymmetric ability as copyright owners to make private enhancements while competitors needed to public; Sleepycat GPL'd their database so that commercial software projects that wanted to use BerkeleyDB had to pay for an alternate license, &c.

Taking from a GPL project and refusing to contribute back to it on GPL terms is antisocial, but then, Google doesn't do that (they do they opposite, contributing more than any normal software firm).

But declining to originate software under the GPL isn't antisocial. I appreciate what the GPL does and have used it for projects myself, but virtually anyone who works professionally knows to think twice before adopting GPL'd libraries and thus constraining their future options.

I don't think this criticism of yours really makes any sense.

Also, as a bystander to this little debate on HN, the sincere thanks you just gave was a good start, but you also cast aspersions on his motives for discussing this here, and I think you owe him an apology as well. It's an HN rule not to say those kinds of things about other commenters.

jordigh · on May 8, 2017

I refuse to accept the conclusion that the GPL is anti-commercial or that it makes it more difficult to use code commercially. I think the companies who are indeed the vast majority, who think that the only way to commercialise software is to hide the source code and keep things secret are overall wrong.

When none of Google, Microsoft, Apple, or Facebook actually originate code under the GPL, all this does is further this conclusion that seems so unquestionable to you that the GPL is anti-commercial or restrictive or just something to hesitate about. What the big guys do the little guys believe. We need more free software, and we need to defend the proliferation of free software, and we need to protest against those who work against free software, who control our search results and the ads shown to us and the information collected about us behind the veil of secret source code.

I refuse to apologise for questioning Google's motives. Their GPL refusal, except when forced to, is not a good thing. If they wanted free software they would be doing things like pursuing the rampant GPL violations in Android devices or defending their copyleft against VMWare's attempts to circumvent Linux, not leave it to charities like SFC.

tptacek · on May 8, 2017

Nobody's asking you to apologize for questioning Google's motives.

jordigh · on May 8, 2017

In this case, DannyBee has said he designed some of Google's motives, so the distinction is difficult to make. I guess I am not understanding exactly what offense I committed. This is a failing on my part, and if you can clearly pinpoint what I said that requires an apology, I would appreciate it.

MichaelBurge · on May 7, 2017

I wonder if random contributors are in a stronger position to resist patent trolls than Google: If you don't have any money, nobody will want to sue you.

Does the CLA actually help them with patents, though? If you use something patented without paying the license fee, it shouldn't matter who gave it to you.

mks40 · on May 6, 2017

Big thanks to Derek for his stackoverflow responses, have saved me so much time, especially considering how uninteresting that support work might be in general compared to designing and implementing new systems.

sherjilozair · on May 7, 2017

Tensorflow isn't really regarded as a very welcoming open source project in the deep learning community. The deep learning academic community is already moving to PyTorch, not just because of the imperative style programming, but also to avoid Google lock-in and the pseudo-open ethos. PyTorch is used and developed by the community, and backed by multiple companies, not just one.

laingc · on May 7, 2017

I'd be interested to know how you formed this impression. My own impression is formed by:

* The work I and my colleagues are doing * Recently published literature and arxiv pre-prints * Conference talks * Industry meet ups * Blog posts

Based on my anecdotal experience from these sources, I see absolutely no evidence of Torch experiencing any kind of resurgence. If I had to make a call about the direction if the community as a whole, I would say that it is very clearly heading towards TensorFlow, with some holdouts using Theano and some using mxnet. Torch is used by some groups, certainly, but I have the impression that it's use is decreasing, rather than increasing.

p1esk · on May 7, 2017

Just to give you another anecdotal experience:

I'm a researcher who used Theano for 3 years to train convnets. A couple of months ago I realized that Theano is getting too much pain to work with (main reasons being the lack of implementations for latest models, and difficulty of using multiple GPUs), so I decided to switch to a more popular framework. I looked at TF, and almost started porting my code to it, then someone suggested I look at PyTorch. After 30 minutes playing with it, I was sold. Much more intuitive. Major architectures from the last 12 months have been implemented. Dynamic graphs are probably something I will need in the future. Community is very active and helpful. The downside is that the software is not as mature as TF, and the community is smaller, but that's changing fast.

p.s. What you wrote about Torch is correct though, but we are talking about PyTorch, not Torch.

make3 · on May 7, 2017

as a deep learning professional who enjoys tensorflow, who is surrounded by other professionals who also like tf quite a bit.. I think you are maybe exaggerating a bit here, and would like to know your sources

hoschicz · on May 7, 2017

What do you mean with 'imperative style programming'?

jimfleming · on May 7, 2017

Here imperative style means you can use language constructs in e.g. Python, such as if-statements and while-loops, to directly construct models. TensorFlow, Theano, and other graph-based frameworks typically require creating branches and loops as nodes in the graph.

Graphs have advantages but they can be unfamiliar and sometimes difficult:

Some advantages: easier to serialize the whole graph and distribute computation; optimization can also be performed across the graph nodes (e.g. see XLA in TensorFlow).

Some disadvantages: it can be more difficult to write and reason about, particularly for recurrent neural networks which can utilize loops a lot; also interop with reinforcement learning environments where much of the computation is performed in an environment outside of the graph.

EternalData · on May 6, 2017

It's really cool to peer behind the hood of really massive open source projects. I have to give kudos for the TensorFlow team for staying on top of what must be a massive amount of work!

EvgeniyZh · on May 7, 2017

The sad part is that open-sourced TensorFlow isn't same as TensorFlow that Google uses.

I've also heard Google is not really fond of cooperation with other corporations (say, NVidia) on TensorFlow

vrv · on May 7, 2017

The codebase in GitHub is pretty much exactly the same as the internal, the main exceptions being things like having to rewrite include paths for files, filesystem plugins for internal cluster filesystems, etc; and those things are modularized so that we can have equivalent implementations in the OSS build to support things like HDFS and GCS filesystems, RDMA network layer communication, etc.

We daily sync the code between the two repositories using a suite of tools we've built. I'm on sync rotation this week and you can see all of my commits and activity on GitHub as proof; I merged something like 60 commits from the community just this week. It wouldn't make any sense for anybody (Google or the community) to maintain two different versions for something that is actively being developed by so many contributors. I've also directly worked with NVidia engineers on improvements they've made (and merged) to the system; the ones I've dealt with are great, so that statement is also false.

I'll be giving a talk about all the work we do to make this possible at OSCON next week, and if you are there feel free to catch me to ask me questions.

EvgeniyZh · on May 7, 2017

Obviously I don't have any proofs that I can provide. Just small talk with people here and there. Some NVidia engineer told me that Google is (was?) very uncooperative to work on some DL stuff together, as opposed to, say Facebook.

About internal version, this is pure speculation, based on idea that TPU are programmed with some framework, thus out should be TF, thus there is closed part and there should (could) be others.

vrv · on May 8, 2017

I'm sorry to hear of that experience, and it's certainly not intentional. We should (and I will) always try to be better (I know some will forever view Google as an antagonistic behemoth but usually the engineers on both sides are trying to do the right thing; it just takes time to come to consensus and understanding sometimes).

Regarding internal version: we built TensorFlow to support devices as modular plugins: the CPU and GPU devices are built this way (you can read the source code to see how device registration works), and the same registration mechanism is used for the TPU code, which can't be opensourced due to internal dependencies. Internal customers just link in an additional library to get TPU support, but it still uses the same core codebase that is available in the opensource world. I know this because I wrote a lot of the device modularity and TPU binding code :)

EvgeniyZh · on May 8, 2017

I see, maybe I was a bit exaggerating.

I'd really like to talk more about birth Tensorflow and TPU, but unfortunately I won't come to OSCON. May be some other time :)

linkmotif · on May 6, 2017

This is a great post!! Thank you.