Installing 10-Series GPUs, CUDA 8.0RC, and Tensorflow on Ubuntu 14.04

As of mid September, Tensorflow is easy to install if all you want to use is a CPU.  As someone with no experience using Cuda, installing GPU drivers on Ubuntu or using Bazel, I ran into a panoply of issues installing Tensorflow with 10-series GPU support:

  • 10-series GPUs require CUDA 8RC for the libraries to work properly
  • instructions on the tensorflow website only work with CUDA < 8
  • CUDA 8.0 won’t work with the Tensorflow packages on on the tensorflow website  so you’ll have to build from source
  • the current source code on github has bugs that need manual repairs.

After a weekend, I finally got Tensorflow to work with an EVGA GTX 1060 GPU.  In this post, I’ll go through my steps starting with preparing your machine before installing the GPU and ending with installing Tensorflow into a conda environment. Although this post assumes Ubuntu 14.04, many of the troubleshooting techniques originated from people using 16.04.

0. Prep your machine

Prepping your machine will take a surprising large amount of time and will come back to bite ya if it’s not done correctly.  If you still are having issues with getting your GPU to work with Tensorflow, it’s probably because something went wrong here.  I’d suggest doing this part carefully.

Removing previous drivers and modules

To remove previous nvidia drivers use

 $ sudo apt-get purge nvidia-* 

Check if you have any modules in your kernel by running

 $ dkms status 

If you get any output, you’ll have to use remove the modules.

nvidia modules

If you have nvidia modules installed, the output would be in the form

 <module name>,<module version>,<kernel version> 

Remove the nvidia module with

 $ sudo dkms remove <module name>/<module version> -k <kernel version> 

bbswitch modules

Uninstall bbswitch with

 $ sudo apt-get remove bbswitch-dkms 

Uninstalling CUDA Toolkit

If you’ve installed a previous version of the CUDA toolkit or want to start clean, remove the old directory with

$ rm -rf /usr/local/cuda-7.5

but substitute the appropriate path and version of CUDA (the above is the default). Then remove and purge using apt

$ sudo apt-get remove nvidia-cuda-toolkit
$ sudo apt-get remove --auto-remove nvidia-cuda-toolkit
$ sudo apt-get purge nvidia-cuda-toolkit
$ sudo apt-get purge --auto-remove nvidia-cuda-toolkit

nomodeset

You’ll probably have to edit the file in /etc/default/grub so that


GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nomodeset"

This will let Ubuntu use nouveau drivers (these are reverse engineered drivers not created by nvidia) so that you can use your GPU before installing the nvidia driver (step 4).

1. Physically install the GPU

This will hopefully be the easiest part of the install.  If you’ve never installed a graphics card, there are plenty of youtube videos to help.  Make sure that you install the card into a PCIe 3.0 to get the best performance.

2. Install CUDA 8.0RC

Download the CUDA Toolkit 8.0RC from nvidia (you’ll need to signup by giving them your email).  For the installer type, I used the deb (local) method, but people have had success using the runfile.

Cudadownload.png

Just add the package and install.


$ cd ~/Downloads
$ sudo dpkg -i cuda-repo-ubuntu1404-8-0-rc_8.0.27-1_amd64.deb
$ sudo apt-get update
$ sudo apt-get install cuda

3. cuDNN v5.1

You’ll need to download the cuDNN v5.1, which is a library for machine learning, from nvidia (you’ll need to respond to a survey and accept terms before downloading).

cdDNNdownload.png

Next, unpack cuDNN, copy the files into the cuda folders, and set permisions.


$ cd ~/Downloads
$ tar xvzf cudnn-8.0-linux-x64-v5.1-ga.tz
$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
$ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

4. Install the 367.44 driver

First download the driver from nvidia.

driver1

Make sure to select the correct series, product, and OS.  Then you should be directed to the 367.44 driver, which you should download.

driver2

The easiest way to install the driver is to use the GUI so you won’t have to mess with stopping Xorg.  Ubuntu should be able to find the downloaded driver if it’s in your Downloads folder. Open up Additional Drivers by using the GUI,

additonaldrivers

or open a terminal and enter

$ software-properties-gtk --open-tab=4 

You should see the 367.44 driver; select it and apply.

drivers2.png

5. Reboot

Now that you think the driver is installed, it’s good to reboot the machine

$ sudo reboot 

and make sure everything works.

If you’re drivers are properly installed, everything will boot like normal.  You’ll know your drivers aren’t working if, when Ubuntu tries to start, your monitor turns purple or black and stays that color. You can try to go back and repeat single steps, but you might have to restart this process by going back to step 0 and working your way back here.

6. Install Bazel

To install Bazel, you can follow the instructions on their site, but they are repeated here.

$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java8-installer
$ echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
$ curl https://storage.googleapis.com/bazel-apt/doc/apt-key.pub.gpg | sudo apt-key add -
$ sudo apt-get update
$ sudo apt-get install bazel
$ sudo apt-get upgrade bazel

7. Clone Tensorflow

You’ll be building Tensorflow from source, so the first step is to clone from github.


$ git clone https://github.com/tensorflow/tensorflow.git 

It doesn’t matter where you clone it into (I chose the home directory), because you’ll use Bazel to build a pip wheel that you then can use with pip to install where ever you’d like.

8. Fix errors in current Tensorflow

It turns out that there are some errors (As of September 11th) in the source code (see this discussion), so you’ll need to manually edit some of the BUILD files. For each edit, just search for the key term in the specified location and comment out per the instructions.

Edit 1

key term: android_tensorflow_kernels_no_rtti_lite_runtime

location:  ~/tensorflow/tensorflow/core/kernels/BUILD

comment instructions: There should only be one occurrence of the term; comment out the target where it occurs.

edit1

Edit 2

key term: ios_tensorflow_test_lib

location: ~/tensorflow/tensorflow/core/BUILD

comment instructions: There should only be one occurrence of the term; comment out the target where it occurs.

edit2.png

Edit 3

key terms: if_mobile, if_android

location: ~/tensorflow/tensorflow/contrib/session_bundle/BUILD

comment instructions: Search for if_mobile or if_android and you should find 5 occurrences.  For the first occurrence, inside load, you don’t need to comment anything out.

edit3-1

For the other 4 occurrences, you’ll need to comment out the if_mobile, if_android, and if_ios conditions as shown below.

edit3-2

9. Install Anaconda

If you haven’t already, install anaconda by following these directions. It helps to do this before configuring the Bazel build because anaconda has all the packages that Tensorflow needs to compile.

10. Configure the build

To configure the Bazel build, go to the root of your tensorflow directory (the code below assumes it is in the home directory) and run the configure file.

$ cd ~/tensorflow
$ ./configure
 

Then make sure to accept the GPU installation and you can get away with accepting the defaults for the rest.

10. Compile Tensorflow with Bazel

To build Tensorflow from source, use these  Tensorflow instructions which are repeated here.

$ bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
$ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

11. Set up conda environment to install tensorflow

The name of the conda environment will be “tensorflow.” Install all anaconda packages (so you can use stuff like jupyter notebook right away).

 $ conda create --name tensorflow anaconda 

UPDATE: If attempting to install in a conda environment, make sure not to specify the python version (e.g. python=2.7). For some reason, when you install the Tensorflow pip package, all your numpy packages will break (even outside the environment!), and you’ll have to reinstall anaconda.

12. Install Tensorflow

First, activate the environment that Tensorflow will be installed in, then install Tensorflow.


$ source activate tensorflow
$ sudo pip install /tmp/tensorflow_pkg/tensorflow-0.10.0rc0-py2-none-any.whl

You’re all done! Hopefully everything worked and you can get back to training models. Thanks for reading!

One thought on “Installing 10-Series GPUs, CUDA 8.0RC, and Tensorflow on Ubuntu 14.04

  1. Hello blogger, i must say you have hi quality articles here.
    Your website should go viral. You need initial traffic
    boost only. How to get it? Search for: Mertiso’s tips go viral

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s