Step 1. Install NVIDIA CUDA:
To use TensorFlow with NVIDIA GPUs, the first step is to install the CUDA Toolkit as shown:
wget -c -v -nc https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.2.88-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1604_9.2.88-1_amd64.deb
sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
sudo apt-get update
sudo apt-get install cuda
Keep checking the NVIDIA CUDA webpage for new releases as applicable. This article is accurate as at the time of writing.
Ensure that you have the latest driver:
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update && sudo apt-get -y upgrade
On Ubuntu 18.04LTS, this should be enough for the device driver:
sudo apt-get install nvidia-kernel-source-396 nvidia-driver-396
Failure to do this will result in a broken driver installation.
When done, create a library configuration file for cupti:
/etc/ld.so.conf.d/cupti.conf
With the content:
/usr/local/cuda/extras/CUPTI/lib64
Confirm that the library configuration file for CUDA libraries also exists with the correct settings:
/etc/ld.so.conf.d/cuda.conf
The content should be:
/usr/local/cuda/lib64
When done, load the new configuration:
sudo ldconfig -vvvv
Useful environment variables for CUDA:
Edit the /etc/environment
file and append the following:
CUDA_HOME=/usr/local/cuda
Now, append the PATH variable with the following:
/usr/local/cuda/bin:$HOME/bin
When done, remember to source the file:
source /etc/environment
You can also install CUDA manually. However, take care not to install its' bundled driver.
Step 2. Install NVIDIA cuDNN:
Once the CUDA Toolkit is installed, download the latest cuDNNN Library for Linux, based on the CUDA version you're using. In this case, we're on CUDA 9.1, so we will refer to the version name below (note that you will need to register for the Accelerated Computing Developer Program).
Once downloaded, uncompress the files and copy them into the CUDA Toolkit directory (assumed here to be in /usr/local/cuda/
for Ubuntu 16.04LTS):
$ sudo tar -xvf cudnn-9.1-* -C /usr/local
Step 3. Install and upgrade PIP:
TensorFlow itself can be installed using the pip package manager. First, make sure that your system has pip installed and updated:
$ sudo apt-get install python-pip python-dev
$ pip install --upgrade pip
Step 4. Install Bazel:
To build TensorFlow from source, the Bazel build system (and the latest available openjdk) must first be installed as follows.
$ sudo apt-get install software-properties-common swig
$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java8-installer
$ echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
$ curl https://storage.googleapis.com/bazel-apt/doc/apt-key.pub.gpg | sudo apt-key add -
$ sudo apt-get update
$ sudo apt-get install bazel
Step 5. Install TensorFlow
To obtain the best performance with TensorFlow we recommend building it from source.
First, clone the TensorFlow source code repository:
$ git clone https://github.com/tensorflow/tensorflow
Then run the configure script as follows:
$ ./configure
Output:
❯ ./configure
WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
INFO: Invocation ID: 4cbfc3de-8110-45e3-855d-3b4c45444627
You have bazel 0.22.0 installed.
Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3
Found possible Python library paths:
/usr/lib/python3/dist-packages
/usr/local/lib/python3.6/dist-packages
Please input the desired Python library path to use. Default is [/usr/lib/python3/dist-packages]
Do you wish to build TensorFlow with XLA JIT support? [Y/n]: Y
XLA JIT support will be enabled for TensorFlow.
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.
Do you wish to build TensorFlow with ROCm support? [y/N]: n
No ROCm support will be enabled for TensorFlow.
Do you wish to build TensorFlow with CUDA support? [y/N]: Y
CUDA support will be enabled for TensorFlow.
Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 10.0]: 10.0
Please specify the location where CUDA 10.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]:
Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Do you wish to build TensorFlow with TensorRT support? [y/N]: N
No TensorRT support will be enabled for TensorFlow.
Please specify the locally installed NCCL version you want to use. [Default is to use https://github.com/nvidia/nccl]: /usr/local/cuda
Please specify the location where NCCL /usr/local/cuda library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Assuming NCCL header path is /usr/local/cuda-10.0/../include/nccl.h
Invalid path to NCCL /usr/local/cuda toolkit, /usr/local/cuda-10.0/ or /usr/local/cuda-10.0/../include/nccl.h not found. Please use the O/S agnostic package of NCCL 2
Please specify the locally installed NCCL version you want to use. [Default is to use https://github.com/nvidia/nccl]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,7.0]: 3.7,5.2,7.0
Do you want to use clang as CUDA compiler? [y/N]: N
nvcc will be used as CUDA compiler.
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Do you wish to build TensorFlow with MPI support? [y/N]: N
No MPI support will be enabled for TensorFlow.
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]:
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: N
Not configuring the WORKSPACE for Android builds.
Then call bazel to build the TensorFlow pip package:
bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=mkl //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
This will build the package with optimizations for FMA, AVX and SSE.
A stock build would be as such:
bazel build //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
You can use the stock build as shown above if you had passed the configuration flags (for optimization) directly to the configure script above. Use this string:
--copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2
Which will replace --march=native
(the default).
If you're on Skylake to Coffee lake, this is what you need.
And finally install the TensorFlow pip package
For Python 2.7:
$ sudo pip install --upgrade /tmp/tensorflow_pkg/tensorflow-*.whl
Python 3.4:
$ sudo pip install --upgrade /tmp/tensorflow_pkg/tensorflow-*.whl
Step 5. Upgrade protobuf:
Upgrade to the latest version of the protobuf package:
For Python 2.7:
$ sudo pip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/protobuf-3.0.0b2.post2-cp27-none-linux_x86_64.whl
For Python 3.4:
$ sudo pip3 install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/protobuf-3.0.0b2.post2-cp34-none-linux_x86_64.whl
Step 6. Test your installation:
To test the installation, open an interactive Python shell and import the TensorFlow module:
$ cd
$ python
>>> import tensorflow as tf
tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
With the TensorFlow module imported, the next step to test the installation is to create a TensorFlow Session, which will initialize the available computing devices and provide a means of executing computation graphs:
>>> sess = tf.Session()
This command will print out some information on the detected hardware configuration. For example, the output on a system containing a Tesla M40 GPU is:
>>> sess = tf.Session()
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: Tesla M40
major: 5 minor: 2 memoryClockRate (GHz) 1.112
pciBusID 0000:04:00.0
Total memory: 11.25GiB
Free memory: 11.09GiB
To manually control which devices are visible to TensorFlow, set the CUDA_VISIBLE_DEVICES
environment variable when launching Python. For example, to force the use of only GPU 0:
$ CUDA_VISIBLE_DEVICES=0 python
You should now be able to run a Hello World application:
>>> hello_world = tf.constant("Hello, TensorFlow!")
>>> print sess.run(hello_world)
Hello, TensorFlow!
>>> print sess.run(tf.constant(123)*tf.constant(456))
56088
Tips:
To achieve similar results without building the packages, you can deploy nvidia-docker and install tensorflow from NVIDIA's NGC registry.
Use this to deploy nvidia-docker on Ubuntu: https://gist.github.com/Brainiarc7/a8ab5f89494d053003454efc3be2d2ef
Use the NGC to deploy the preconfigured containers. Optimized builds for Tensorflow, Caffe, Torch, etc are also available: https://www.nvidia.com/en-us/gpu-cloud/deep-learning-containers/
Also see the NGC panel: https://ngc.nvidia.com/registry
TensorFlow 1.12.0 GPU CUDA 10 (3.7,5.2,7.0) cuDNN 7 GCC 6 SSE4.x AVX MKL Ubuntu 18.04, amd64: