After spending hours trying to identify the correct driver version and the way to install Nvidia drivers, CUDA and cuDNN I have curated the steps here.
You can find the required CUDA and cuDNN version for tensorflow-gpu here: https://www.tensorflow.org/install/source
If you'd like a clean install, please uninstall the previous nvidia drivers and cuda version:
For apt-get driver and cuda install:
sudo apt-get purge "*nvidia*" "*cublas*" "cuda*" "nsight*" "libcudnn*" "libnccl*" "*nvidia*"
sudo apt-get autoremove
sudo apt-get autoclean
sudo rm -rf /usr/local/cuda*
check for nvidia driver/cuda presence using:
dpkg -l | grep -i nvidia
dpkg -l | grep -i cuda
If you installed the nvidia driver using .run file:
sudo /usr/bin/nvidia-uninstall
After the above steps your system should be clean and ready for installation
If you are planning to use tensorflow-gpu version, make sure you know the exact cuda and cuDNN version that is required along with the gcc version.
In my case I wanted cuda-10.0 cuDNN-7.4 for tensorflow-gpu-1.13. I have multiple gcc version installed in my system. Please refer to this post for details: https://linuxconfig.org/how-to-switch-between-multiple-gcc-and-g-compiler-versions-on-ubuntu-20-04-lts-focal-fossa
I downloaded the nvidia driver file from their website. Note that you need to install the appropriate nvidia driver based on the cuda version you are installing. (It took me a lot of time to figure this out!)
If you go to their webiste based on your nvidia product you'd get the latest driver: https://www.nvidia.com/download/index.aspx?lang=en-us
However, for the appropriate driver version based on cuda please download from https://www.nvidia.com/Download/driverResults.aspx/148589/en-us
sudo sh sudo sh NVIDIA-Linux-x86_64-430.34.run
nvidia-smi should return
PATH$ nvidia-smi
Thu Sep 9 21:38:26 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.34 Driver Version: 430.34 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 207... Off | 00000000:01:00.0 Off | N/A |
| 21% 41C P0 1W / 215W | 0MiB / 7982MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Finally, reboot after nvidia driver installation
sudo reboot
Download the .run file from https://developer.nvidia.com/cuda-10.0-download-archive
Install using the below command:
sudo sh cuda_10.0.130_410.48_linux.run
It will appear like this:
Do you accept the previously read EULA?
accept/decline/quit: accept
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 410.48?
(y)es/(n)o/(q)uit: n
Install the CUDA 10.0 Toolkit?
(y)es/(n)o/(q)uit: y
Enter Toolkit Location
[ default is /usr/local/cuda-10.0 ]:
Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit:
Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y
Install the CUDA 10.0 Samples?
(y)es/(n)o/(q)uit: y
Enter CUDA Samples Location
[ default is PATH ]: y
Samples location must be an absolute path
Enter CUDA Samples Location
[ default is PATH ]:
Installing the CUDA Toolkit in /usr/local/cuda-10.0 ...
Installing the CUDA Samples in PATH ...
Copying samples to PATH/NVIDIA_CUDA-10.0_Samples now...
Finished copying samples.
===========
= Summary =
===========
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-10.0
Samples: Installed in PATH
Please make sure that
- PATH includes /usr/local/cuda-10.0/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-10.0/lib64, or, add /usr/local/cuda-10.0/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-10.0/bin
Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-10.0/doc/pdf for detailed information on setting up CUDA.
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 384.00 is required for CUDA 10.0 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run -silent -driver
Logfile is /tmp/cuda_install_10720.log
Follow the command-line prompts and say 'no' for driver install and 'yes' for all the other cases. During the license agreement, hit space and that should move through the page faster.
Export path by pasting this code to your .bashrc
vim ~/.bashrc
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64:$LD_LIBRARY_PATH
export PATH=/usr/local/cuda-10.0/bin:$PATH
source ~/.bashrc
To test CUDA installation run the following
cd /usr/local/cuda/samples
sudo make -k
./bin/x86_64/linux/release/deviceQuery
That should give you something like
./bin/x86_64/linux/release/deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce RTX 2070 SUPER"
CUDA Driver Version / Runtime Version 10.1 / 10.0
CUDA Capability Major/Minor version number: 7.5
Total amount of global memory: 7982 MBytes (8370061312 bytes)
(40) Multiprocessors, ( 64) CUDA Cores/MP: 2560 CUDA Cores
GPU Max Clock rate: 1770 MHz (1.77 GHz)
Memory Clock rate: 7001 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 4194304 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 1024
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 3 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.1, CUDA Runtime Version = 10.0, NumDevs = 1
Result = PASS
You have successfully installed CUDA!
Download the .deb files for the appropriate cuDNN version from https://developer.nvidia.com/cudnn. But, you'll need an account. So, go ahead and create one. Download runtime library, developer library and code samples and user guilde (deb) for your ubuntu version.
sudo dpkg -i libcudnn7_7.4.2.24-1+cuda10.0_amd64.deb
sudo dpkg -i libcudnn7-dev_7.4.2.24-1+cuda10.0_amd64.deb
sudo dpkg -i libcudnn7-doc_7.4.2.24-1+cuda10.0_amd64.deb
To test cuDNN installation, copy /usr/src/cudnn_samples_v7/ to any folder. cd into that folder
cd cudnn_samples_v7/mnistCUDNN/
make clean && make
./mnistCUDNN
This should return
PATH:~/Desktop/Cadence/bin_quant/driver-files/cudnn_samples_v7/mnistCUDNN$ ./mnistCUDNN
cudnnGetVersion() : 7402 , CUDNN_VERSION from cudnn.h : 7402 (7.4.2)
Host compiler version : GCC 4.8.5
There are 1 CUDA capable devices on your machine :
device 0 : sms 40 Capabilities 7.5, SmClock 1770.0 Mhz, MemSize (Mb) 7982, MemClock 7001.0 Mhz, Ecc=0, boardGroupID=0
Using device 0
Testing single precision
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 0
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.015264 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.026432 time requiring 57600 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.034240 time requiring 3464 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.059200 time requiring 207360 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.059680 time requiring 2057744 memory
Resulting weights from Softmax:
0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006
Result of classification: 1 3 5
Test passed!
Testing half precision (math in single precision)
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 0
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.012288 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.020544 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.024768 time requiring 3464 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.044832 time requiring 207360 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.049696 time requiring 2057744 memory
Resulting weights from Softmax:
0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006
Result of classification: 1 3 5
Test passed!
Phew! We installed all three of them!
Note: The one details that took me hours was the nvidia driver version needs to match the version that cuda needs and although it should have worked, directly installing the nvidia driver from cuda install never worked for me. Also, the latest nvidia driver for my GPU didn't work for cuda-10.0 either.
Edit 1: To install 470.94 driver: https://www.nvidia.com/en-us/geforce/drivers/
Tue May 17 18:28:57 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.94 Driver Version: 470.94 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| 21% 44C P0 24W / 215W | 0MiB / 7982MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
UPDATE:
If nvidia-smi fails after reboot, re-install the driver.
sudo sh NVIDIA-Linux-x86_64-470.94.run
Thu Aug 4 15:38:30 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.94 Driver Version: 470.94 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| 21% 46C P0 29W / 215W | 0MiB / 7982MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+