More

programnature · on Aug 13, 2017

If you want to run in production.

Even FB doesn't use PyTorch in production, and instead uses Caffe2.

Q6T46nT668w6i3m · on Aug 13, 2017

PyTorch was certainly used in production at Facebook (and elsewhere).

It might’ve changed in the past few months as Caffe 2 matured, but there were a few internal applications that used PyTorch.

programnature · on Aug 13, 2017

Fwiw this is what Soumith has said: "Internally at Facebook, we have a unified strategy. We say PyTorch is used for all of research and Caffe 2 is used for all of production."

https://www.oreilly.com/ideas/why-ai-and-machine-learning-re...

Its not exactly a secret that PyTorch's tradeoffs favor research and not production.

Q6T46nT668w6i3m · on Aug 13, 2017

Totally. And I think it’s a solid strategy. However, there’s certainly an interest (internally and externally) in providing a better inter-operability story between the two. I imagine something like using Keras for model creation (and possibly training) and running (either on mobile or the cloud) on some Caffe 2 deployment.

programnature · on Aug 13, 2017

Its a good strategy. But its no silver bullet either. If you're exporting to a "static graph" platform, your losing a major benefit of PyTorch. If you mostly just care about shipping to production, a case can be made to just use tf/caffe2/mxnet etc from the start.

While PyTorch is extremely cool, the fanboyism is out of hand, thinking that what's good for their corner of the universe must be awesome for every use case and therefore TF is a overcomplex turd. Its not like the people designing these systems are stupid.

Q6T46nT668w6i3m · on Aug 13, 2017

I agree. PyTorch’s dynamism is fantastic. However, I have no idea how you’d manage to recompile PyTorch code to Caffe 2 in a satisfying way. If something is released, I suspect it’d be limited to a subset of PyTorch features (I’d also bet that subset doesn’t include the features that make PyTorch compelling).

programnature · on Feb 17, 2017

Wolfram was there in 2002. Pretty cool examples though.

chriswarbo · on Feb 17, 2017

If you mean that Wolfram advocates programmatic generation of structures, then that's true; the approach is very different though. These appear to come from a continuous optimisation process, i.e. starting with a "bad" design and iteratively tweaking it. In contrast, Wolfram tends to focus on discrete systems (e.g. cellular automata) and perform the search interactively, like a form of superoptimisation rather than numerical optimisation.

The examples I'd cite are from the 1990s, e.g. evolved antennas ( https://en.wikipedia.org/wiki/Evolved_antenna ) and integrated circuits ( https://en.wikipedia.org/wiki/Evolvable_hardware )

programnature · on Jan 18, 2017

Actually not clear if there is an official affiliation with Facebook, other than some of the primary devs.

throwawayish · on Jan 18, 2017

Notably absent is the otherwise Facebook-typical PATENTS license thing. Which I see as a good sign.

Also, it doesn't look like this has happened just now? PRs in the repo go back a couple months and the repo has 100+ contributors.

smhx · on Jan 18, 2017

it's the same license file as https://github.com/torch/torch7 and http://torch.ch

The C libraries are shared among the Lua and Python variants

programnature · on Dec 7, 2016

utopcell · on Dec 7, 2016

We formulate SGNS word2vec as a distributed graph problem, where nodes are all unique tokens (the dictionary) in the corpus and edges are defined by skipgrams. For skipgram (w_in, w_center), there will be an edge from w_in to w_center.

Tokens are randomly distributed over a set of workers. Each worker iterates over its edges in parallel with all other workers and performs the appropriate computation.

Drawing negative samples is done in two steps. We first draw a worker W from a suitable distribution over the workers and then draw a word from W. The overall word sampling is the same as for the reference implementation (ie, unigram distribution raised to 3/4.)

This work will soon be made public [1].

[1] Stergios Stergiou, Zygimantas Straznickas, Rolina Wu and Kostas Tsioutsiouliklis, ``Distributed Negative Sampling for Word Embeddings''. AAAI 2017.

programnature · on Dec 2, 2016

This is a very thick talk, one Rich's best ever IMHO.

The first point is how we talk about 'change' in software, to center around what things 'provide' and 'require'.

Breaking changes are changes that cause code to require more or provide less. Never do that, never need to do that. Good changes are in the realm of providing more or requiring less.

There is a detailed discussion about the different 'levels' - from functions, to packages, artifacts, and runtimes, which he views as multiple instances of the same problem. Even though we now have spec, theres a lot of work to leverage it across all those different layers to make specific statements about what is provided and required.

mshenfield · on Dec 3, 2016

I found value in dissecting the different levels of change. For the sake of sanity though, we should do breaking changes. Breaking changes exist because we have limited capacity as individuals and an industry to maintain software. This is especially true for infrastructure that is supported by (limited) corporate sponsorship and volunteers. Breaking changes limit our window of focus to two or three snapshots of code, instead of having our window of focus grow without bound. Our limited capacity can still be effective as a library changes over time.

The most important point of this talk is here: "You cannot ignore [compatibility] and have something that is going to endure, and people are going to value" [0]. Breaking changes provide a benefit for library developers, but it is usually damage done to end users. As consumers we should weigh the cost of keeping up with breaking changes against the quality of a tool, and the extra capacity its developers are likely to have.

[0] https://youtu.be/oyLBGkS5ICk?t=4177

michaelfeathers · on Dec 3, 2016

Agreed. Breaking changes can lead to alienation of user base, but I think there's a danger in lulling people into expecting that kind of constancy in software. It creates dependency of another kind. Maybe the trick is to vary features at some rate, getting users used to change and bringing them along.

In retail it used to be the case that you could go to the same store a month later and see the same shirt to buy. The Sears catalog [1] presented that sort of constancy for consumers. Today there's a lot of flux. Some of it actually engineered to prevent people from delaying purchasing decisions. In software we can and do introduce breaking changes for ease of maintenance, and that can be ok as long as people are used to it. It's making the choice to have a living ecosystem.

[1] http://www.searsarchives.com/catalogs/history.htm

solussd · on Dec 12, 2016

Additionally there are safer, usually reasonable, ways to deal with what would otherwise be breaking changes. Give the changed functionality a different name, create a new namespace/module without the removed functionality, or create a new library if you have introduced something fundamentally different (e.g., w.r.t. how you interact with it). That way your users can choose to refactor their code to use the change, rather than discover their expectations no longer match reality when they upgrade.

blueprint · on Dec 3, 2016

Who says you have to maintain old code? We're talking about simply not deleting it and establishing a discrete semantic for the new version as truthfully, a new version is new content which demands a new name to accurately and precisely describe it. If it didn't it would be like saying different content doesn't produce a different hash.

mshenfield · on Dec 4, 2016

You're right, there is no obligation to maintain it. I think that misses the point though. The value in keeping the code is to allow the end user to continue to enjoy improvements in parts of the library that don't have breaking changes without upgrading those that do. You could continue to have security patches installed, for example. That value is much less when you don't do basic maintenance implement bug fixes and security patches.

blueprint · on Dec 4, 2016

Unless I'm missing something… the answer to that problem is to (a) factor the code sufficiently to then (b) create an abstraction (interface) that backs out the concrete implementation to the specifically desired version/functionality.

wtetzner · on Dec 4, 2016

Except that naming things is one of the hard problems. I don't see why a major version bump can't be considered a different library.

I guess you can use version numbers in the name instead, since this talk is specifically targeting maven artifacts.

blueprint · on Dec 4, 2016

Hickey's other talk says hard is relative, and I happen to agree, especially when it comes to naming. The question is to what degree of exactness you can confirm what exists (in problems). That is a function of your degree of truthfulness. So it's "hard" only in the sense it's hard to approach 100% truthfulness. However, I have observed that one doesn't need 100%, one needs to be beyond a certain threshold of effective sufficiency. And according to human history, special, rare individuals are born who do exceed that threshold.

michaelfeathers · on Dec 3, 2016

That's Postel's Law.

programnature · on Sept 11, 2016

Its an interesting question. One reason is Wolfram is extremely talented at language design, which is necessary to build an artifact of this size without self-immolating. Another is that it is a commercial company following a plan. A third is that few people have learned the lessons of Mathematica enough to apply them.

throwaway729 · on Sept 12, 2016

> One reason is Wolfram is extremely talented at language design

It's always a matter of taste when it comes to language design, but I'd have to disagree with this assessment ;-)

> which is necessary to build an artifact of this size without self-immolating

Well, that's certainly not the case. Plenty of huge software artifacts of very impressive quality have been built by non-language-designers.

> Another is that it is a commercial company following a plan

This is certainly true. Or rather, several plans, all of which intersect at common mathematical sub-questions. So then the entire company can leverage effort that's been poured into those components.

> A third is that few people have learned the lessons of Mathematica enough to apply them

Nah. I think the third reason is that Wolfram hires excellent hackers who are also excellent mathematicians. He hires a lot of them. And he puts them to work on the intersectional capabilities I mentioned above.

(Disclaimer: pure conjecture. I've never worked at Wolfram)

programnature · on Aug 30, 2016

While its useful to have this kind of info, IMHO its still far from 'infrastructure for deep learning'. What about model versioning? What about deployment environments? We need to address the whole lifecycle, not just the 'training' bit. This is a huge and underserved part of the problem bc people tend to be satisfied with having 1 model thats good enough to publish.

tlb · on Aug 30, 2016

Indeed, deployment is a whole set of interesting issues. We haven't deployed any learned models in production yet at OpenAI, so it's not at the top of our list.

If the data and models were small and training was quick (on the order of compilation time), I'd just keep the training data in git and train the model from scratch every time I run make. But the data is huge, training requires clusters of machines and can take days, so you need a pipeline.

An industrial strength system looks like this: https://code.facebook.com/posts/1072626246134461/introducing...

platypii · on Aug 31, 2016

CTO of Algorithmia here. We've spent a lot of time thinking about the issues of deploying deep learning models. There are a whole set of challenges that crop up when trying to scale these kinds of deployments (not least of which is trying to manage GPU memory).

It would be interesting to compare notes since we have deployed a number of models in production, and seem to focus on a related but different set of challenges. kenny at company dot com.

programnature · on Aug 30, 2016

Yes, understandable. I encourage viewing this as part of the 'open' mandate.

agibsonccc · on Aug 30, 2016

When you're thinking of "deployment" here - wouldn't it make sense to use the google compute engine for this?

I'd be curious to see if there's a legit speed up there with the "real tensorflow".

For "on prem" stuff I think "deployment" is going to depend on the actual end use case.

Eg:no one in industry will keep their "training data" in git. They'd have an actual database with other systems surrounding it.

If it's just "run the model locally to view a web page running in a docker container I wouldn't see the problem here though.

The infra will also be different for training vs inference. For training you'll want gpus, but it's not realistic to run gpus with inference yet.

I'd love someone to comment on: https://developer.nvidia.com/gpu-inference-engine

though.

There's going to be a lot of non deep learning "stuff" involved here.

Much of it will be connected to the use case. Eg: deep learning for log analytics in production will be different than a computer vision pipeline.

Warning: highly biased player in the space.

ymt123 · on Aug 30, 2016

Have you tried Sacred[1]? It definitely doesn't answer the "infrastructure for deep learning" challenge but it is helpful for understanding what experiments have been run/where did this model come from (including what version of the code/parameters produced it)

[1] https://github.com/IDSIA/sacred

asimuvPR · on Aug 30, 2016

So true. I've been doodling some tools to somehow manage all of it. So far I only have git-like approaches to models and chef-like approaches to infrastructure. I hope to somehow bring all together into a docker-like package that can be deployed without much hassle.

daveguy · on Aug 30, 2016

You might want to check out Pachyderm -- that is essentially what they are trying to do (Analytics infrastructure support. It isn't specific to machine learning):

http://www.pachyderm.io/

asimuvPR · on Aug 30, 2016

I had forgotten about them. Thanks for posting the link.

vonnik · on Aug 30, 2016

Fwiw, we're testing a Dockerized the distro of DL4J. Runs on DCOS.

https://imgur.com/a/CDTAc

https://imgur.com/a/6jlxi

We'll release in coming weeks.

kyloon · on Aug 30, 2016

In terms of deploying trained models, you can probably get away with using TensorFlow Serving and let Kubernetes handle the orchestration and scaling part of the job. I do agree that there is certainly a need to have a tool that glues all these different bits and pieces together for improving the process of taking a model from development to production.

turinturambar · on Aug 30, 2016

Agreed. A very interesting and thoughtful post, but I think that you are right that OpenAI's primary use cases seem to be (unsurprisingly) academic research and rapid prototyping of new ideas. These emphasize very different set of problems than, say, deploying something in production or as a service.

Thus, this post seems immensely useful to someone like me (a PhD student, also primarily concerned with exploring new ideas and getting my next conference paper), but I can see how others doing machine learning in-the-wild or in production might see a lot of questions left unanswered. I, for one, work primarily with health care data from hospital EHRs, and I spent a lot more time with data prep pipelines than folks working with, say, MNIST.

programnature · on June 12, 2016

Did you also appreciate slide 28? "I used to think Wolfram was wrong. Now I am not so sure."

ddumas · on June 13, 2016

Yes, though of course here Stein is referring there to the Wolfram quote that's on slide 28 (roughly: certain kinds of development can't be done in academia) and not the condescending rejection of inquiry about mathematica's internals from earlier in the presentation.

programnature · on April 2, 2016

This is extremely similar to DevCards for Clojurescript, https://github.com/bhauman/devcards, https://www.youtube.com/watch?v=G7Z_g2fnEDg

glenjamin · on April 2, 2016

DevCards is great! Bruce has put a lot of work into making it a really smooth experience, and advocating the benefits of building your components outside of the application first.

I've been working on a more-or-less direct port of devcards into standard React / JS - it's available here: https://github.com/glenjamin/devboard

iheartmemcache · on April 2, 2016

Dan Abramov's DevTools[0] with Hot Reload and "Time Travel" (historical debugging) is basically the same thing too, though its tied to Redux pretty heavily IIRC. So yeah, nothing new. ("TimeWarp OS"[1] was a project developed in the late 80s that did the same thing at the OS level, primarily for physics simulations. (Something would break, you'd go back to state foo, change parameters mu,delta,sigma to yield foo' and continue the run.))

[0]https://www.youtube.com/watch?v=xsSnOQynTHs is the canonical talk on it, certainly worth a watch.

[1] http://www.cs.nyu.edu/srg/talks/timewarp.pdf Brian Beckman formerly of Microsoft as a second author. All those RX features I'd imagine were done largely in part by him and De Smet.

arunoda · on April 2, 2016

Time time debugging is something different here. It's on working with different states of app and try to find issues.

This is a way to list components and show their different use case. We can also develop individual components by just looking at this.

iheartmemcache · on April 2, 2016

I'm missing something, I think. Walk with me through a hypothetical example. I load component todo-list. Dan used integers to mark each modification of virtual DOM, so lets define this as an revision of 42, saved-state-0, labelled with a reference "Populated todo-list", after {"quuz" "bar" "baz"} have been added as elements all bool, all incomplete.

OK, now you can save a ref to this state state, i.e., Component 'todo-list' is now rev 51 (on the vDOM) saved-state-1 (with a reference to rev 53) with a rendering label of : "Bar complete". Check off the 2 remaining elements for component todo-list, we now have saved-state-2 (a reference to rev 53) with label : "Tasks completed".

I'm not saying that what you built isn't useful (I'm 100% certain it is!) but I don't see how it's any different from taking an append-only journal and adding bookmarks to save state, though I really could be missing something since I don't work front-end.

CognitiveLens · on April 3, 2016

It looks like React Storybook uses a similar set of principles to what you're describing but organizes them for a different use case than Redux DevTools. You definitely could build something like React Storybook using Redux DevTools, but from what I understand React Storybook provides a pre-made standalone app wrapper + server that consumes components and applies state specs ("stories") in a 'standardized' way for browsing - you would have to write your own app to leverage Redux DevTools (or even just plain React since you don't have to keep undo/redo info) for the same purpose.

glenjamin · on April 3, 2016

The fancy part is assembling stories for showing all the different states. This is even more powerful if you show them on the screen at once.

Imagine having all the examples below shown on screen, and then editing your component definition and having the hot reloading update them all at once so you can see the effects.

    Todo item normal:
    [ ] A thing to do

    Todo item checked:
    [x] -A-thing-to-do-

    Todo item editing:
    [  A thing to do  ]

    Todo item hovering:
    [ ] A thing to do  [del]

    Todo list show all:
    [ ] ABC
    [x] DEF
    [x] GHI
    3/3 items

    Todo list show incomplete:
    [ ] ABC
    1/3 items

    Todo list show complete
    [x] DEF
    [x] GHI
    2/3 items

Wintamute · on April 3, 2016

I think the selling point for devcards/React storybook et al. is that of a live/visual styleguide of UI components. Imo its raison d'etre is that it promotes a UI component centric methodology for developing web apps, whereby designers and developers can develop UI components in isolation away from the cognitive noise of how those components come together in a single monolithic app. That's the innovation here, if you can call it that, ... the state travelling stuff is an implementation detail.

d0m · on April 3, 2016

I can see how useful it is for the less technical part of the team as a way to see all components and their important states.

lygaret · on April 3, 2016

To me it's not even non-technical users, or juniors: this is a way of documenting components and their important substates, the same way a CSS styleguide does. It can help keep consistency of use, document features and states that may be overlooked by a consumer, or act as a framework for design QA.

glenjamin · on April 3, 2016

It's not even just documentation - you can use it like visual unit tests, so you can see what the effects of a change are to all the different states of a component.

peruvian · on April 3, 2016

DevCards and Figwheel are amazing. As much as I love React and its hot reloading plugins, `lein new figwheel` has always worked better than cloning a boilerplate or setting stuff up.

programnature · on Jan 21, 2016

Theres a 160 people in the Om slack channel. There's more ways to contribute than sending patches, and theres quite a lot of community effort behind Om Next these days.