Skip to content

Instantly share code, notes, and snippets.

@legraphista
Last active April 3, 2025 03:14
Show Gist options
  • Save legraphista/c7f11c29dcc415a309406ae6da941e6e to your computer and use it in GitHub Desktop.
Save legraphista/c7f11c29dcc415a309406ae6da941e6e to your computer and use it in GitHub Desktop.
nvidia driver with p2p support for rtx 4090
# step 0 - cleanup your existing drivers
sudo apt-get --purge remove "*nvidia*"
sudo apt-get --purge remove "*cuda*" "*cudnn*" "*cublas*" "*cufft*" "*cufile*" "*curand*" "*cusolver*" "*cusparse*" "*gds-tools*" "*npp*" "*nvjpeg*" "nsight*" "*nvvm*" "*libnccl*"

# step 0.1 - disable iommu
ll /sys/class/iommu/
# if this folder is empty, continue
# if the folder is not empty, see https://docs.dolphinics.com/latest/guides/iommu.html

sudo reboot

# step 1 - install drivers 
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update

# Search for available Nvidia drivers and install the latest version.
apt search --names-only nvidia-driver
sudo apt install nvidia-driver-560-open # or latest

sudo reboot

# step 1.1 - verify driver
nvidia-smi

# --- 

# step 2 - install cuda & cudnn
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update

apt search --names-only cuda-toolkit
sudo apt install cuda-toolkit-12-6 # or latest

apt search --names-only cudnn
sudo apt install cudnn9-cuda-12-6 # or latest

sudo reboot

# step 2.1 - verify driver
nvidia-smi

# --- 

# step 3 - make cuda samples
sudo apt install git cmake

# Clone the Nvidia CUDA samples repository to test CUDA installation.
cd ~
git clone https://github.com/nvidia/cuda-samples
cd cuda-samples

make -j `nproc`

# --- 

# step 4 - test p2p 
./bin/x86_64/linux/release/deviceQuery
# look for "> Peer access from NVIDIA GeForce RTX 4090 (GPU0) -> NVIDIA GeForce RTX 4090 (GPU1) : {Yes/No}" 
# if you see Yes - stop, you already have p2p
# if you see No - continue

# --- 

# step 5 - uninstall official driver (just the driver)
sudo apt remove nvidia-driver-550-open
sudo apt remove nvidia-dkms-550-open
sudo apt remove nvidia-driver-550-server-open
sudo apt remove nvidia-dkms-550-server-open

# step 5.1 - double check
sudo dpkg -l | grep nvidia
# verify that all nvidia drivers / dkms are uninstalled 

# --- 

# step 6 - install patched driver
cd ~
git clone [email protected]:tinygrad/open-gpu-kernel-modules.git
cd open-gpu-kernel-modules

# step 6.1 - Building your driver
nvidia-smi # note down driver version - you've uninstalled the driver, but it should still be loaded in memory, if not, use `dpkg -l | grep nvidia` to grab the version `560.35.03-0ubuntu1` => `560.35.03`

git remote add upstream [email protected]:NVIDIA/open-gpu-kernel-modules.git
git fetch --all

# rebase patch with git or your preferred GUI
git rebase -Xignore-space-change -i upstream/560.35.03 # your driver version - make sure tag exists - you might have to deal with a conflicting README


# step 6.2 - make driver
./install.sh
# make modules -j$(nproc)
# sudo checkinstall make modules_install -j$(nproc)
# name = nvidia-driver-550-open-patch-tinygrad
# version = {driver-version}-p2p

sudo depmod

sudo reboot

# step 6.3 verify driver
nvidia-smi

# --- 

# step 7 - test p2p
cd ~/cuda-samples
./bin/x86_64/linux/release/deviceQuery
# look for "> Peer access from NVIDIA GeForce RTX 4090 (GPU0) -> NVIDIA GeForce RTX 4090 (GPU1) : {Yes/No}" 
# if you see Yes - stop, you're done
# if you see No - go to troubleshooting

# --- 

# troubleshooting
see https://morgangiraud.medium.com/multi-gpu-nvidia-p2p-capabilities-and-debugging-tips-fb7597b4e2b5
see https://morgangiraud.medium.com/multi-gpu-tinygrad-patch-4904a75f8e16
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@legraphista
Copy link
Author

I've never had this issue, try posting in https://github.com/tinygrad/open-gpu-kernel-modules and ask for help there. They most likely know better

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment