This guide provides a comprehensive walkthrough for compiling the Deep Graph Library (DGL) version 2.4.0 from source on modern hardware, specifically targeting systems with NVIDIA Blackwell (e.g., GB10) or Hopper GPUs and a recent CUDA toolkit (13.0+).
The standard build process for this older DGL version fails due to outdated build scripts that are incompatible with new GPU architectures and compilers. The following steps include the necessary patches to overcome these issues and create a fully optimized build.
This process highlights several common challenges when compiling older source code on new hardware:
- Multi-Part Build Systems: A fix in one part of the code (the main DGL library) doesn't automatically apply to its sub-components (like
graphbolt), which may have their own isolated and buggy build scripts. - Outdated Architecture Lists: Older build scripts often contain hardcoded lists of GPU architectures that cause modern compilers to fail. These must be manually updated.
- Initial Compiler Checks: Some build systems fail right at the start because even the initial test to check if the CUDA compiler works is using an outdated architecture flag.
- Dependency Management: When installing a custom-built package,
pip's automatic dependency resolution can be problematic, sometimes overwriting carefully installed packages (like PyTorch). Using the--no-depsflag is the key to preventing this.
First, create a clean, isolated Conda environment to ensure a stable foundation without conflicting dependencies.
- Create a Conda Environment:
conda create -n dgl_build python=3.11 conda activate dgl_build
- Verify System CUDA: Ensure your system's optimized CUDA toolkit is correctly installed and accessible.
nvcc --version # Expected output should show version 12.x or 13.x - Install the Correct PyTorch: Install the official PyTorch wheel that is pre-built for your system's CUDA version.
# This command is for CUDA 13.0; adjust if your version differs pip install torch --index-url https://download.pytorch.org/whl/cu130 - Install Build Tools and Dependencies: Add the necessary compilers and all of DGL's required Python packages to your environment.
conda install -c conda-forge c-compiler cxx-compiler pip install networkx pandas psutil pydantic pyyaml requests scipy tqdm
Next, clone the DGL repository and check out the specific v2.4.0 tag.
- Clone the DGL Repository:
git clone https://github.com/dmlc/dgl.git cd dgl - Check Out the v2.4.0 Release Tag:
git fetch --all --tags git checkout tags/v2.4.0
- Initialize Submodules:
git submodule update --init --recursive
This is the most critical section. We will manually patch three bugs across two different build files.
- Open the file:
nano cmake/modules/CUDA.cmake
- Find and delete the entire faulty architecture logic block (lines 12 through 24):
# --- DELETE THIS ENTIRE BLOCK --- set(dgl_known_gpu_archs "35" "50" "60" "70" "75") set(dgl_cuda_arch_ptx "70") if (CUDA_VERSION_MAJOR GREATER_EQUAL "11") list(APPEND dgl_known_gpu_archs "80" "86") set(dgl_cuda_arch_ptx "80" "86") endif() if (CUDA_VERSION VERSION_GREATER_EQUAL "11.8") list(APPEND dgl_known_gpu_archs "89" "90") set(dgl_cuda_arch_ptx "90") endif() if (CUDA_VERSION VERSION_GREATER_EQUAL "12.0") list(REMOVE_ITEM dgl_known_gpu_archs "35") endif()
- Replace it with a single, correct line targeting your GPU architecture.
# --- ADD THIS LINE IN ITS PLACE --- # Use "121" for Blackwell (GB10) or "90" for Hopper (H100) set(dgl_known_gpu_archs "121")
This file requires two separate fixes.
-
Open the file:
nano graphbolt/CMakeLists.txt
-
Fix #2A (Initial Compiler Check): Find the
if(USE_CUDA)block near the top (around line 5). Insert a new line to set the architecture before CUDA is enabled.Before:
if(USE_CUDA) message(STATUS "Build graphbolt with CUDA support") enable_language(CUDA) add_definitions(-DGRAPHBOLT_USE_CUDA) endif()
After:
if(USE_CUDA) set(CMAKE_CUDA_ARCHITECTURES "121") # Use "121" for Blackwell or "90" for Hopper message(STATUS "Build graphbolt with CUDA support") enable_language(CUDA) add_definitions(-DGRAPHBOLT_USE_CUDA) endif()
-
Fix #2B (Faulty Filter Logic): Find and delete the broken filter block (around lines 70 through 76):
# --- DELETE THIS ENTIRE BLOCK --- set(CMAKE_CUDA_ARCHITECTURES_FILTERED ${CMAKE_CUDA_ARCHITECTURES}) # CUDA extension supports only sm_70 and up (Volta+). list(FILTER CMAKE_CUDA_ARCHITECTURES_FILTERED EXCLUDE REGEX "[2-6][0-9]") list(LENGTH CMAKE_CUDA_ARCHITECTURES_FILTERED CMAKE_CUDA_ARCHITECTURES_FILTERED_LEN) if(CMAKE_CUDA_ARCHITECTURES_FILTERED_LEN EQUAL 0) # Build the CUDA extension at least build for Volta. set(CMAKE_CUDA_ARCHITECTURES_FILTERED "70") endif()
-
Replace it with a single line that correctly passes the architecture.
# --- ADD THIS LINE IN ITS PLACE --- set(CMAKE_CUDA_ARCHITECTURES_FILTERED ${CMAKE_CUDA_ARCHITECTURES})
With the patches applied, compile the code and package it into a distributable Python wheel.
- Run the Build Script: From the
dglroot directory, execute the build script.rm -rf build bash script/build_dgl.sh -g
- Create the Python Wheel: After the build succeeds, package everything into a
.whlfile. This creates adist/directory.python setup.py bdist_wheel
Finally, install the wheel you just created. The crucial step is to tell pip to ignore DGL's outdated dependency list to avoid overwriting your modern PyTorch version.
- Install the DGL Wheel Without Dependencies:
pip install dist/dgl-*.whl --force-reinstall --no-deps - Verify the Installation: Create a file named
check.pywith the following content to run a comprehensive test.# check.py import torch import dgl print(f"DGL version: {dgl.__version__}") print(f"PyTorch version: {torch.__version__}") cuda_ok = torch.cuda.is_available() print(f"CUDA available (PyTorch): {cuda_ok}") if not cuda_ok: print("Installation failed: PyTorch cannot see the GPU.") else: device = torch.device("cuda:0") print(f"Using device: {torch.cuda.get_device_name(device)}") # Test creating a DGL graph on the GPU u = torch.tensor([0, 1, 2], device=device) v = torch.tensor([1, 2, 0], device=device) g = dgl.graph((u, v)) print(f"Graph successfully created on device: {g.device}") # Test a DGL CUDA kernel (SAGEConv) from dgl.nn import SAGEConv model = SAGEConv(16, 32, "mean").to(device) features = torch.randn(g.num_nodes(), 16, device=device) output = model(g, features) output.sum().backward() print("DGL CUDA kernels (SAGEConv) are working correctly.") print("\n✅ DGL has been successfully built and installed with CUDA support!")
- Run the script:
python check.py
If all checks pass, you have officially conquered the build! You now have a DGL library fully optimized for your hardware. 🏆