Setting Up a Deep Learning Machine from Scratch

minimaxir · on May 14, 2016

The dependency hell required to run a good deep learning machine is one of the reasons why using Docker/VM is not a bad idea. Even if you follow the instructions in the OP to the letter, you can still run into issues where a) an unexpected interaction with permissions/other package versions causes the build to fail and b) building all the packages can take an hour+ to do even on a good computer.

The Neural Doodle tool (https://github.com/alexjc/neural-doodle), which appeared on HN a couple months ago (https://news.ycombinator.com/item?id=11257566), is very difficult to set up without errors. Meanwhile, the included Docker container (for the CPU implementation) can get things running immediately after a 311MB download, even on Windows which otherwise gets fussy with machine learning libraries. (I haven't played with the GPU container yet, though)

Nvidia also has an interesting implementation of Docker which allows containers to use the GPU on the host: https://github.com/NVIDIA/nvidia-docker

jre · on May 14, 2016

I can't speak for caffe, but I got a new machine up and running with keras in about half an hour last week without having to compile anything :

- install the nvidia drivers, cuda and cudnn which all come as ubuntu packages.

- use anaconda for python

- conda install theano

- pip install tensorflow as per the website

- pip install keras

And voilà, I got my keras models running on my shinny new GTX 980.

lqdc13 · on May 14, 2016

Cuda is hard to set up because half the time that messes with your screen settings. I.E. you boot into black screen.

Using Cuda packages results in Theano not working. https://github.com/Theano/Theano/issues/4430

zenlikethat · on May 15, 2016

Keep in mind that you also knew how to do this ahead of time (installing the CUDA-related libraries is overwhelmingly difficult if you haven't done it before), and didn't run in to any issues with numpy / scipy version compatibility (I've had quite a bit of "fun" having to install numpy etc. from source in the past), and were presumably lucky enough to have a well supported GPU.

Is there a motivation for anaconda other than "it includes the stuff we usually need"? It strikes me as somewhat strange that anaconda is so often recommended but it forces a divergence from using the mainline packages with vanilla Python.

pwang · on May 16, 2016

The reason and motivation for Anaconda is precisely to help people avoid the "fun" you alluded to of building numpy/scipy/etc. from source. Even on systems which provide the right compiler and headers and runtime libs for a given version of Python downloaded from Python.org, it's nontrivial to ensure that you're going to get an optimal build. And furthermore, once you have that build working, you have to maintain your build toolchain to ensure that you can build future packages that have native source.

Anaconda does not force a "divergence" from using anything. You can still build and pip install to your heart's delight. It just provides a baseline of the more complex-to-build packages, all built in such a way as to be compatible with each other at the C level.

lqdc13 · on May 15, 2016

numpy/scipy super easy to install (apt-get install python3-scipy) if you are not trying to link it to a manually compiled ATLAS/MKL. Otherwise you have to download and modify config files to point to the lib in case of MKL and update alternatives for libblas/liblapack for both MKL/ATLAS.

Cuda on the other hand is annoying because you have to both make sure the driver works with multiscreen setups and that cuda links against that driver correctly and uses a specific gcc version.

zenlikethat · on May 15, 2016

apt-get and/or pip have frequently given me versions of numpy and/or scipy behind what TensorFlow, Theano, Keras etc. want resulting in cryptic errors that don't show up until attempting to run a script.

pwang · on May 16, 2016

Yes, that's why people don't use apt-get or pip, and rather install Anaconda.

Pip and wheels are still not really suitable for scientific Python work, because the metadata facilities are not sufficiently rich to capture all the information needed for proper linking of native libraries. By contrast, in Anaconda, things like MKL linkage and Fortran compiler information can be used in the package metadata for dependency solving, to minimize these kinds of compatibility issues.

zenlikethat · on May 16, 2016

Interesting and thanks for the summary, seems the motivations are a bit clearer to me now. Is there intention in moving Anaconda's unique features upstream?

pwang · on May 17, 2016

Well, kind of. We've tried to work with the python core packaging folks to improve the built-in metadata facilities. (There has been a checkered history there in terms of reception to our ideas...)

In terms of making these packages easier to build, that's really not actually where the problem is. The fact that numpy, scipy, cython, etc. need to have a shared compiler and build toolchain is really a result of operating systems and the C language ABI works.

zenlikethat · on May 15, 2016

Nvidia has been making some pretty cool strides with https://github.com/NVIDIA/nvidia-docker, but I wish they would expose some images / docker run commands directly instead of wrapping it with their own (bespoke) Go tool. In its current form its utility strikes me as really limited since it can't be used with other tooling.

flx42_ · on May 15, 2016

The CLI wrapper is provided for convenience since it should be enough for most users. We recently added advanced documentation on our wiki, we explain how you can avoid relying on the nvidia-docker wrapper: https://github.com/NVIDIA/nvidia-docker/wiki/Internals

This section should also have plenty of information for you if your goal is to integrate GPU support in your own container tool.

fapjacks · on May 15, 2016

Ah you are fucking awesome! I have been using this to Dockerize all of my ML stuff. I'm hugely appreciative for this project!

sandGorgon · on May 15, 2016

Do you have your Dockerfile up somewhere?

zenlikethat · on May 15, 2016

Nice, good call out, thanks.

akhilcacharya · on May 15, 2016

I do wish bringing Ubuntu to W10 is going to make this easier - I'd rather not have to dual boot to use ML personally.

JackFr · on May 14, 2016

Disappointed. Misread it -- I thought he was going to do deep learning with https://scratch.mit.edu/, not from scratch.

harryf · on May 14, 2016

Likewise. Might be fascinating for kids to have some "ready to roll" machine learning routines e.g. Watch the cat explore and learn how to get round a maze

a1k0n · on May 15, 2016

That would be pretty impressive, considering the only data structure included in scratch is a list, and it's pretty cumbersome to address.

pilooch · on May 14, 2016

Commoditizing deep learning is mandatory. After repetitive in production installs at various corps while connecting to existing pipelines, I ve convinced some of them to sponsor a commoditized open source deep learning server.

Code is here: https://github.com/beniz/deepdetect

There are differenciated CPU and GPU docker versions, and as mentioned elsewhere in this thread, they are the easiest way to setup even production system without critical impact on performances, thanks to nvidia-docker. It seems they are more popular than AMI within our little community.

mastazi · on May 15, 2016

I'm sorry if this is only tangentially on topic:

I was reading the article and got to the part related to installing CUDA drivers.

I am currently on the market for a laptop which will be used for self-learning purposes and I am interested in trying GPU-based ML solutions.

In my search for the most cost-effective machine, some of the laptops that I came across are equipped with AMD GPUs and it seems that support for them is not as good as for their Nvidia counterparts: so far I know of Theano and Caffe supporting OpenCL and I know support might come in the future from TensorFlow [1], in addition I saw that there are solutions for Torch [2] although they seem to be developed by single individuals.

I was wondering if someone with experience in ML could give me some advice: is the AMD route viable?

[1] https://github.com/tensorflow/tensorflow/issues/22

[2] https://github.com/torch/torch7/wiki/Cheatsheet#opencl

jfsantos · on May 15, 2016

At the current state of things, AMD is definitely not a viable route, but that might change in the future with the "Boltzmann Initiative" [1]. Performance with OpenCL is not comparable with CUDA on NVIDIA GPUs at the moment, and support is lacking in most deep learning frameworks.

[1] http://www.amd.com/en-us/press-releases/Pages/boltzmann-init...

mastazi · on May 15, 2016

Thanks, your statement about performance is very helpful. Basically even if my tech stack supported OpenCL I would still be better of with a CUDA-compatible card.

SixSigma · on May 15, 2016

We used to have a saying "friends don't make friends run closed software".

I guess those days have gone.

iaml · on May 16, 2016

Nvidia is doing a pretty damn good job at staying the only viable choice for machine learning. Or amd's lack of action is, anyway.

mastazi · on May 15, 2016

I hear you, and I really wish an open alternative became prevalent over CUDA.

mastazi · on May 15, 2016

Thank you @jfsantos and @p1esk. I'm going for a model equipped with an Nvidia GTX card I guess...

p1esk · on May 15, 2016

zacharyfmarion · on May 14, 2016

I posted something similar on my blog (http://zacharyfmarion.io/machine-learning-with-amazon-ec2/) not too long ago. Would be nice if there was a tool that set all of this up for you!

vonnik · on May 14, 2016

I work on Deeplearning4j, and I'm told that the install process is not too hellish. Feedback welcome there:

http://deeplearning4j.org/quickstart

http://deeplearning4j.org/gettingstarted

Someone in the community also Dockerized Spark + Hadoop + OpenBlas:

https://github.com/crockpotveggies/docker-spark

The GPU release is coming out Monday.

agibsonccc · on May 15, 2016

For scala folks,we're working on the corresponding docker containers for spark + cuda setup as well.

We'll also make it so you can experiment with models etc from a notebook: https://github.com/andypetrella/spark-notebook

Not exactly for the python crowd (we mainly try to be an alternative for the jvm stack)

profen · on May 14, 2016

The steps are pretty neat. Also agree on the driver and tools installation. Just painful and long.

looks like there are seperate torch and caffe amis as well for amazon. Going to try later.

https://aws.amazon.com/marketplace/pp/B01B52CMSO

https://aws.amazon.com/marketplace/pp/B01B4ZSX5S

zenlikethat · on May 15, 2016

The issue with using posted AMIs for this is the same as usual: they include god knows what else in addition to the installed and configured software (which is likely to also lag behind master / latest release quite a bit). Last few AMIs I tried for this included some random public keys as authorized users in a sudoer account! While they're likely benign (belonging to researchers that created these images), that'd be a nasty surprise to find in your data pipeline down the line.

mbajkowski · on May 15, 2016

This is a valid concern, which is one of the reasons we publish these AMIs through the AWS marketplace. Each of these AMIs had to go through the AWS security checker script as well as a manual review by the AWS marketplace team, please see the "Securing an AMI" section here.

https://aws.amazon.com/marketplace/help/201231340

Going through the AWS audit does take a few days to say the least and can be a hassle at times, but usually we are pretty close to the latest master / release.

zenlikethat · on May 15, 2016

Cool, didn't know that about the Marketplace. Thanks for sharing.

visarga · on May 14, 2016

Is there a host offering GPU systems preconfigured with ML frameworks and models, for playing around? Something simple to use like Digital Ocean.

mbajkowski · on May 14, 2016

I'm one of the devs for some of the AWS AMIs mentioned a few comments below which have the frameworks and examples installed, and run on CPU as well as GPU instances. We have several AMIs including one for TensorFlow:

https://aws.amazon.com/marketplace/pp/B01EYKBEQ0/ref=_ptnr_h...

Would love to get some feedback from anyone who gives them a spin about what we could do better - or which AMIs we should add that people may find useful.

If you are not familiar with AWS, we have quick-start blog here as well:

http://www.bitfusion.io/2016/05/09/easy-tensorflow-model-tra...

dbcurtis · on May 14, 2016

Pardon the n00b question, but I'm near the bottom of the learning curve on this.

It looks like this runs on GPU-less instances, as well as Gx-instances. So, how do you envision this being used? Would someone do prototyping on the cheap instances and then move up to the Gx instances for production? Is that move transparent?

mbajkowski · on May 15, 2016

You could do precisely that. Get started on a small instance, play around with one of the frameworks (one of the reasons why we also integrated Jupyter as part of the AMIs so the you can quickly write some python code from the browser without having to ssh into the instance). And then when all checks out, migrate the image (by creating a snapshot) and booting it on a more powerful instance.

For TensorFlow if an operation has both CPU and GPU implementations, the GPU devices will be given priority (if present on the instance) when the operation is assigned to a device. For Caffe we have both the GPU and CPU version installed.

nl · on May 15, 2016

Or train on the GPU and run inference on the CPU instances.

roseway4 · on May 15, 2016

Full disclosure: I work in marketing at Domino.

Domino offers prebuilt deep learning environments on its data science platform. Tensorflow, PyCUDA, Theano, H2O, a variety of R deep learning packages, and many other tools are available.

The environments run on Amazon's Nvidia GRID and Tesla instances.

This post is a little old but will give you an idea as to what you can accomplish: http://blog.dominodatalab.com/gpu-computing-and-deep-learnin...

We offer select prospects access to GPU compute when trialing Domino. Let me know if your team is interested.

profen · on May 14, 2016

Have used this digits ami on aws in the past for caffe and torch.

https://aws.amazon.com/marketplace/pp/B01DJ93C7Q/ref=srh_res...

amelius · on May 14, 2016

Step 1: make sure that your machine has sufficient free PSI slots for the GPU cards, and that you have sufficient physical space inside the machine.

Seriously... why can't there be a better way of adding coprocessors to a machine? Like stacking some boxes, interconnected by parallel ribbon cable, or something like that?

MikeTLive · on May 14, 2016

Google for "external PCI card cage" these have been around for over 15 years.

e12e · on May 14, 2016

Thunderbolt promises this. It doesn't quite deliver yet, but it might.

tzz · on May 14, 2016

If someone creates a Juju Charm https://jujucharms.com for this, then you can use the pre-configured service on any of the major public clouds.

agibsonccc · on May 15, 2016

You mean this? https://github.com/SaMnCo/layer-skymind-dl4j

We work with the ubuntu team on juju.

nl · on May 15, 2016

That doesn't do the Cuda/cuDNN stuff does it? (Ie, the hard, error prone part).

agibsonccc · on May 15, 2016

Right. So we'll be adding cuda and all that as well.

We are working very closely with canonical/IBM on the whole DL stack[1]. You will also see some stuff from us and nvidia here within the next month or so on making cuda a bit easier to setup in a normal data science "stack" eg: jvm/python hybrid product stacks. Cudnn has tricky licensing but it shouldn't be that bad to automate setting up the cuda part.

[1]: https://insights.ubuntu.com/2016/04/25/making-deep-learning-...

nl · on May 15, 2016

Pretty sure you can't do that because of the stupid NVidia license/download thing.

agibsonccc · on May 15, 2016

Our enterprise distro(cdh for deep learning) will include mkl and cudnn.

I will work out what we are allowed to do licemse wise. Reach out to adam@skymind.io if that is interesting.

tacos · on May 14, 2016

I don't understand the fascination with these "make a list" style setup instructions, as they're almost immediately outdated, and seldom updated.

We have AMI, we have docker, we have (gasp) shell scripts. It's 2016, Why am I cutting and pasting between a web page and a console?

To my knowledge the only thing that does something like this well is oh-my-zsh. And look at the success they've had! So either do it right, or don't do it at all.

danso · on May 14, 2016

> I don't understand the fascination with these "make a list" style setup instructions...

> ...as they're almost immediately outdated, and seldom updated.

Your second sentiment is the reason why I appreciate the "make a list"-style tutorials. When something inevitably goes out of date, I can at least see some of the narrative and reasoning for each step, instead of trying to debug someone's shell script that they've left to obsolescence.

Even better is when I have 3 such tutorial-lists to compare, making it easier to see which steps are integral and which steps were simply author-specific conventions.

raverbashing · on May 14, 2016

No, you don't need to restart your machine after you install CUDA.

Also you might not need to restart after you install the drivers, this is not Windows. (But there might be some rmmod/modprobe needed)

> If your deep learning machine is not your primary work desktop, it helps to be able to access it remotely

Yes, use ssh.

modeless · on May 14, 2016

Your snark is outdated. Windows can upgrade graphics drivers without a reboot these days and even supports hotplugging GPUs to a limited extent. Meanwhile, like many others I regularly have to perform console surgery on my Linux machines when they fail to boot to X after fiddling with the graphics drivers. (My latest discovery is that Ubuntu's auto-updates will auto-destroy your NVIDIA drivers if you have an unexpected version of GCC set as default). Graphics drivers are emphatically not something Linux people should be crowing about to Windows users.

raverbashing · on May 14, 2016

> My latest discovery is that Ubuntu's auto-updates will auto-destroy your NVIDIA drivers if you have an unexpected version of GCC set as default). Graphics drivers are emphatically not something Linux people should be crowing about to Windows users.

I have to agree. Especially when the Ubuntu package drivers ruin your system.

hondaz54 · on May 14, 2016

Yeah but do not blame Ubuntu or other distributors or Linux, but NVidia, who ship shitty proprietary drivers that do not integrate well in the *nix system.

PascLeRasc · on May 14, 2016

I've had massive problems with my Nvidia gpu using anything but the open-source driver. I'm not sure how much performance I'm losing but it's worth not spending hours debugging.

jsheard · on May 14, 2016

> Also you might not need to restart after you install the drivers, this is not Windows.

Windows has been able to hotswap graphics drivers (and recover from driver crashes) for years now.

PascLeRasc · on May 14, 2016

When I last used Windows a year ago, I had an "eject Nvidia GTX 760" in the right-click context menu next to usb drives. Never tried it though.

jsheard · on May 14, 2016

That's not supposed to happen with a normal graphics card, but ejecting GPUs is actually a thing now that Thunderbolt 3 supports external GPU docks.

im3w1l · on May 14, 2016

When I tried to setup CUDA last, I had to reinstall Linux several times because I broke it so badly. Not Windows indeed.

raverbashing · on May 14, 2016

You, or the CUDA packaging are doing something wrong then.

Installing a broken driver may cause the machine to not boot, fair enough, but CUDA should be straightforward.

orbifold · on May 14, 2016

Trying to install the Vulkan Beta driver on Linux broke the system, on windows it is happily crashing and restarting.