# step 0 - cleanup your existing drivers
sudo apt-get --purge remove "*nvidia*"
sudo apt-get --purge remove "*cuda*" "*cudnn*" "*cublas*" "*cufft*" "*cufile*" "*curand*" "*cusolver*" "*cusparse*" "*gds-tools*" "*npp*" "*nvjpeg*" "nsight*" "*nvvm*" "*libnccl*"
# step 0.1 - disable iommu
ll /sys/class/iommu/
# if this folder is empty, continue
# if the folder is not empty, see https://docs.dolphinics.com/latest/guides/iommu.html
sudo reboot
# step 1 - install drivers
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
# Search for available Nvidia drivers and install the latest version.
apt search --names-only nvidia-driver
sudo apt install nvidia-driver-560-open # or latest
sudo reboot
# step 1.1 - verify driver
nvidia-smi
# ---
# step 2 - install cuda & cudnn
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
apt search --names-only cuda-toolkit
sudo apt install cuda-toolkit-12-6 # or latest
apt search --names-only cudnn
sudo apt install cudnn9-cuda-12-6 # or latest
sudo reboot
# step 2.1 - verify driver
nvidia-smi
# ---
# step 3 - make cuda samples
sudo apt install git cmake
# Clone the Nvidia CUDA samples repository to test CUDA installation.
cd ~
git clone https://github.com/nvidia/cuda-samples
cd cuda-samples
make -j `nproc`
# ---
# step 4 - test p2p
./bin/x86_64/linux/release/deviceQuery
# look for "> Peer access from NVIDIA GeForce RTX 4090 (GPU0) -> NVIDIA GeForce RTX 4090 (GPU1) : {Yes/No}"
# if you see Yes - stop, you already have p2p
# if you see No - continue
# ---
# step 5 - uninstall official driver (just the driver)
sudo apt remove nvidia-driver-550-open
sudo apt remove nvidia-dkms-550-open
sudo apt remove nvidia-driver-550-server-open
sudo apt remove nvidia-dkms-550-server-open
# step 5.1 - double check
sudo dpkg -l | grep nvidia
# verify that all nvidia drivers / dkms are uninstalled
# ---
# step 6 - install patched driver
cd ~
git clone [email protected]:tinygrad/open-gpu-kernel-modules.git
cd open-gpu-kernel-modules
# step 6.1 - Building your driver
nvidia-smi # note down driver version - you've uninstalled the driver, but it should still be loaded in memory, if not, use `dpkg -l | grep nvidia` to grab the version `560.35.03-0ubuntu1` => `560.35.03`
git remote add upstream [email protected]:NVIDIA/open-gpu-kernel-modules.git
git fetch --all
# rebase patch with git or your preferred GUI
git rebase -Xignore-space-change -i upstream/560.35.03 # your driver version - make sure tag exists - you might have to deal with a conflicting README
# step 6.2 - make driver
./install.sh
# make modules -j$(nproc)
# sudo checkinstall make modules_install -j$(nproc)
# name = nvidia-driver-550-open-patch-tinygrad
# version = {driver-version}-p2p
sudo depmod
sudo reboot
# step 6.3 verify driver
nvidia-smi
# ---
# step 7 - test p2p
cd ~/cuda-samples
./bin/x86_64/linux/release/deviceQuery
# look for "> Peer access from NVIDIA GeForce RTX 4090 (GPU0) -> NVIDIA GeForce RTX 4090 (GPU1) : {Yes/No}"
# if you see Yes - stop, you're done
# if you see No - go to troubleshooting
# ---
# troubleshooting
see https://morgangiraud.medium.com/multi-gpu-nvidia-p2p-capabilities-and-debugging-tips-fb7597b4e2b5
see https://morgangiraud.medium.com/multi-gpu-tinygrad-patch-4904a75f8e16
Last active
April 3, 2025 03:14
-
-
Save legraphista/c7f11c29dcc415a309406ae6da941e6e to your computer and use it in GitHub Desktop.
nvidia driver with p2p support for rtx 4090
driver565.57.1 enabled resize-bar, question:CUDA error at simpleP2P.cu:129 code=205(cudaErrorMapBufferObjectFailed) "cudaDeviceEnablePeerAccess(gpuid[1], 0)"。 why this
You might have to enable resizable bar in your BIOS. it should be under pci-e settings
hi,
The Resizeable Bar has been started in the BIOS, and the ACS for PCI-E has been disabled and the IOMMU has been turned off.
Steps:
1、./NVIDIA-Linux-x86_64-550.90.07.run --no-kernel-modules
2、download 550.90.07-p2p
3、./install.sh
4. deviceQuery is normal
5. simpleP2P will report the error mentioned above
Can you help me know what to look out for?
| |
yingzilnn01
|
|
***@***.***
|
---- Replied Message ----
| From | Ștefan-Gabriel ***@***.***> |
| Date | 12/20/2024 19:43 |
| To | ***@***.***> |
| Cc | ***@***.***> |
| Subject | Re: legraphista/readme.md |
@legraphista commented on this gist.
driver565.57.1 enabled resize-bar, question:CUDA error at simpleP2P.cu:129 code=205(cudaErrorMapBufferObjectFailed) "cudaDeviceEnablePeerAccess(gpuid[1], 0)"。 why this
You might have to enable resizable bar in your BIOS. it should be under pci-e settings
—
Reply to this email directly, view it on GitHub or unsubscribe.
You are receiving this email because you commented on the thread.
Triage notifications on the go with GitHub Mobile for iOS or Android.
I've never had this issue, try posting in https://github.com/tinygrad/open-gpu-kernel-modules and ask for help there. They most likely know better
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
driver565.57.1 enabled resize-bar, question:CUDA error at simpleP2P.cu:129 code=205(cudaErrorMapBufferObjectFailed) "cudaDeviceEnablePeerAccess(gpuid[1], 0)"。 why this