Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The dependency hell required to run a good deep learning machine is one of the reasons why using Docker/VM is not a bad idea. Even if you follow the instructions in the OP to the letter, you can still run into issues where a) an unexpected interaction with permissions/other package versions causes the build to fail and b) building all the packages can take an hour+ to do even on a good computer.

The Neural Doodle tool (https://github.com/alexjc/neural-doodle), which appeared on HN a couple months ago (https://news.ycombinator.com/item?id=11257566), is very difficult to set up without errors. Meanwhile, the included Docker container (for the CPU implementation) can get things running immediately after a 311MB download, even on Windows which otherwise gets fussy with machine learning libraries. (I haven't played with the GPU container yet, though)

Nvidia also has an interesting implementation of Docker which allows containers to use the GPU on the host: https://github.com/NVIDIA/nvidia-docker



I can't speak for caffe, but I got a new machine up and running with keras in about half an hour last week without having to compile anything :

- install the nvidia drivers, cuda and cudnn which all come as ubuntu packages.

- use anaconda for python

- conda install theano

- pip install tensorflow as per the website

- pip install keras

And voilà, I got my keras models running on my shinny new GTX 980.


Cuda is hard to set up because half the time that messes with your screen settings. I.E. you boot into black screen.

Using Cuda packages results in Theano not working. https://github.com/Theano/Theano/issues/4430


Keep in mind that you also knew how to do this ahead of time (installing the CUDA-related libraries is overwhelmingly difficult if you haven't done it before), and didn't run in to any issues with numpy / scipy version compatibility (I've had quite a bit of "fun" having to install numpy etc. from source in the past), and were presumably lucky enough to have a well supported GPU.

Is there a motivation for anaconda other than "it includes the stuff we usually need"? It strikes me as somewhat strange that anaconda is so often recommended but it forces a divergence from using the mainline packages with vanilla Python.


The reason and motivation for Anaconda is precisely to help people avoid the "fun" you alluded to of building numpy/scipy/etc. from source. Even on systems which provide the right compiler and headers and runtime libs for a given version of Python downloaded from Python.org, it's nontrivial to ensure that you're going to get an optimal build. And furthermore, once you have that build working, you have to maintain your build toolchain to ensure that you can build future packages that have native source.

Anaconda does not force a "divergence" from using anything. You can still build and pip install to your heart's delight. It just provides a baseline of the more complex-to-build packages, all built in such a way as to be compatible with each other at the C level.


numpy/scipy super easy to install (apt-get install python3-scipy) if you are not trying to link it to a manually compiled ATLAS/MKL. Otherwise you have to download and modify config files to point to the lib in case of MKL and update alternatives for libblas/liblapack for both MKL/ATLAS.

Cuda on the other hand is annoying because you have to both make sure the driver works with multiscreen setups and that cuda links against that driver correctly and uses a specific gcc version.


apt-get and/or pip have frequently given me versions of numpy and/or scipy behind what TensorFlow, Theano, Keras etc. want resulting in cryptic errors that don't show up until attempting to run a script.


Yes, that's why people don't use apt-get or pip, and rather install Anaconda.

Pip and wheels are still not really suitable for scientific Python work, because the metadata facilities are not sufficiently rich to capture all the information needed for proper linking of native libraries. By contrast, in Anaconda, things like MKL linkage and Fortran compiler information can be used in the package metadata for dependency solving, to minimize these kinds of compatibility issues.


Interesting and thanks for the summary, seems the motivations are a bit clearer to me now. Is there intention in moving Anaconda's unique features upstream?


Well, kind of. We've tried to work with the python core packaging folks to improve the built-in metadata facilities. (There has been a checkered history there in terms of reception to our ideas...)

In terms of making these packages easier to build, that's really not actually where the problem is. The fact that numpy, scipy, cython, etc. need to have a shared compiler and build toolchain is really a result of operating systems and the C language ABI works.


Nvidia has been making some pretty cool strides with https://github.com/NVIDIA/nvidia-docker, but I wish they would expose some images / docker run commands directly instead of wrapping it with their own (bespoke) Go tool. In its current form its utility strikes me as really limited since it can't be used with other tooling.


The CLI wrapper is provided for convenience since it should be enough for most users. We recently added advanced documentation on our wiki, we explain how you can avoid relying on the nvidia-docker wrapper: https://github.com/NVIDIA/nvidia-docker/wiki/Internals

This section should also have plenty of information for you if your goal is to integrate GPU support in your own container tool.


Ah you are fucking awesome! I have been using this to Dockerize all of my ML stuff. I'm hugely appreciative for this project!


Do you have your Dockerfile up somewhere?


Nice, good call out, thanks.


I do wish bringing Ubuntu to W10 is going to make this easier - I'd rather not have to dual boot to use ML personally.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: