Skip to content

Instantly share code, notes, and snippets.

@ramithuh
Last active October 22, 2025 21:57
Show Gist options
  • Select an option

  • Save ramithuh/af8dcd23619f3ed00e733926ce2ad11a to your computer and use it in GitHub Desktop.

Select an option

Save ramithuh/af8dcd23619f3ed00e733926ce2ad11a to your computer and use it in GitHub Desktop.
How to install dgl 2.4.0 with cuda 13.0 and pytorch 2.9

Compiling DGL v2.4.0 on Modern GPUs (Blackwell/Hopper)

This guide provides a comprehensive walkthrough for compiling the Deep Graph Library (DGL) version 2.4.0 from source on modern hardware, specifically targeting systems with NVIDIA Blackwell (e.g., GB10) or Hopper GPUs and a recent CUDA toolkit (13.0+).

The standard build process for this older DGL version fails due to outdated build scripts that are incompatible with new GPU architectures and compilers. The following steps include the necessary patches to overcome these issues and create a fully optimized build.

Key Lessons from this Build

This process highlights several common challenges when compiling older source code on new hardware:

  • Multi-Part Build Systems: A fix in one part of the code (the main DGL library) doesn't automatically apply to its sub-components (like graphbolt), which may have their own isolated and buggy build scripts.
  • Outdated Architecture Lists: Older build scripts often contain hardcoded lists of GPU architectures that cause modern compilers to fail. These must be manually updated.
  • Initial Compiler Checks: Some build systems fail right at the start because even the initial test to check if the CUDA compiler works is using an outdated architecture flag.
  • Dependency Management: When installing a custom-built package, pip's automatic dependency resolution can be problematic, sometimes overwriting carefully installed packages (like PyTorch). Using the --no-deps flag is the key to preventing this.

1. Environment Setup

First, create a clean, isolated Conda environment to ensure a stable foundation without conflicting dependencies.

  1. Create a Conda Environment:
    conda create -n dgl_build python=3.11
    conda activate dgl_build
  2. Verify System CUDA: Ensure your system's optimized CUDA toolkit is correctly installed and accessible.
    nvcc --version
    # Expected output should show version 12.x or 13.x
  3. Install the Correct PyTorch: Install the official PyTorch wheel that is pre-built for your system's CUDA version.
    # This command is for CUDA 13.0; adjust if your version differs
    pip install torch --index-url https://download.pytorch.org/whl/cu130
  4. Install Build Tools and Dependencies: Add the necessary compilers and all of DGL's required Python packages to your environment.
    conda install -c conda-forge c-compiler cxx-compiler
    pip install networkx pandas psutil pydantic pyyaml requests scipy tqdm

2. Fetch and Prepare DGL Source Code

Next, clone the DGL repository and check out the specific v2.4.0 tag.

  1. Clone the DGL Repository:
    git clone https://github.com/dmlc/dgl.git
    cd dgl
  2. Check Out the v2.4.0 Release Tag:
    git fetch --all --tags
    git checkout tags/v2.4.0
  3. Initialize Submodules:
    git submodule update --init --recursive

3. Patching the Build System for Modern Architectures

This is the most critical section. We will manually patch three bugs across two different build files.

Patch #1: The Main DGL Build (cmake/modules/CUDA.cmake)

  1. Open the file:
    nano cmake/modules/CUDA.cmake
  2. Find and delete the entire faulty architecture logic block (lines 12 through 24):
    # --- DELETE THIS ENTIRE BLOCK ---
    set(dgl_known_gpu_archs "35" "50" "60" "70" "75")
    set(dgl_cuda_arch_ptx "70")
    if (CUDA_VERSION_MAJOR GREATER_EQUAL "11")
      list(APPEND dgl_known_gpu_archs "80" "86")
      set(dgl_cuda_arch_ptx "80" "86")
    endif()
    if (CUDA_VERSION VERSION_GREATER_EQUAL "11.8")
      list(APPEND dgl_known_gpu_archs "89" "90")
      set(dgl_cuda_arch_ptx "90")
    endif()
    if (CUDA_VERSION VERSION_GREATER_EQUAL "12.0")
      list(REMOVE_ITEM dgl_known_gpu_archs "35")
    endif()
  3. Replace it with a single, correct line targeting your GPU architecture.
    # --- ADD THIS LINE IN ITS PLACE ---
    # Use "121" for Blackwell (GB10) or "90" for Hopper (H100)
    set(dgl_known_gpu_archs "121")

Patch #2: The graphbolt Sub-Build (graphbolt/CMakeLists.txt)

This file requires two separate fixes.

  1. Open the file:

    nano graphbolt/CMakeLists.txt
  2. Fix #2A (Initial Compiler Check): Find the if(USE_CUDA) block near the top (around line 5). Insert a new line to set the architecture before CUDA is enabled.

    Before:

    if(USE_CUDA)
      message(STATUS "Build graphbolt with CUDA support")
      enable_language(CUDA)
      add_definitions(-DGRAPHBOLT_USE_CUDA)
    endif()

    After:

    if(USE_CUDA)
      set(CMAKE_CUDA_ARCHITECTURES "121") # Use "121" for Blackwell or "90" for Hopper
      message(STATUS "Build graphbolt with CUDA support")
      enable_language(CUDA)
      add_definitions(-DGRAPHBOLT_USE_CUDA)
    endif()
  3. Fix #2B (Faulty Filter Logic): Find and delete the broken filter block (around lines 70 through 76):

    # --- DELETE THIS ENTIRE BLOCK ---
    set(CMAKE_CUDA_ARCHITECTURES_FILTERED ${CMAKE_CUDA_ARCHITECTURES})
    # CUDA extension supports only sm_70 and up (Volta+).
    list(FILTER CMAKE_CUDA_ARCHITECTURES_FILTERED EXCLUDE REGEX "[2-6][0-9]")
    list(LENGTH CMAKE_CUDA_ARCHITECTURES_FILTERED CMAKE_CUDA_ARCHITECTURES_FILTERED_LEN)
    if(CMAKE_CUDA_ARCHITECTURES_FILTERED_LEN EQUAL 0)
      # Build the CUDA extension at least build for Volta.
      set(CMAKE_CUDA_ARCHITECTURES_FILTERED "70")
    endif()
  4. Replace it with a single line that correctly passes the architecture.

    # --- ADD THIS LINE IN ITS PLACE ---
    set(CMAKE_CUDA_ARCHITECTURES_FILTERED ${CMAKE_CUDA_ARCHITECTURES})

4. Compile and Package the Wheel

With the patches applied, compile the code and package it into a distributable Python wheel.

  1. Run the Build Script: From the dgl root directory, execute the build script.
    rm -rf build
    bash script/build_dgl.sh -g
  2. Create the Python Wheel: After the build succeeds, package everything into a .whl file. This creates a dist/ directory.
    python setup.py bdist_wheel

5. Final Installation and Verification

Finally, install the wheel you just created. The crucial step is to tell pip to ignore DGL's outdated dependency list to avoid overwriting your modern PyTorch version.

  1. Install the DGL Wheel Without Dependencies:
    pip install dist/dgl-*.whl --force-reinstall --no-deps
  2. Verify the Installation: Create a file named check.py with the following content to run a comprehensive test.
    # check.py
    import torch
    import dgl
    
    print(f"DGL version: {dgl.__version__}")
    print(f"PyTorch version: {torch.__version__}")
    
    cuda_ok = torch.cuda.is_available()
    print(f"CUDA available (PyTorch): {cuda_ok}")
    
    if not cuda_ok:
        print("Installation failed: PyTorch cannot see the GPU.")
    else:
        device = torch.device("cuda:0")
        print(f"Using device: {torch.cuda.get_device_name(device)}")
    
        # Test creating a DGL graph on the GPU
        u = torch.tensor([0, 1, 2], device=device)
        v = torch.tensor([1, 2, 0], device=device)
        g = dgl.graph((u, v))
        print(f"Graph successfully created on device: {g.device}")
    
        # Test a DGL CUDA kernel (SAGEConv)
        from dgl.nn import SAGEConv
        model = SAGEConv(16, 32, "mean").to(device)
        features = torch.randn(g.num_nodes(), 16, device=device)
        output = model(g, features)
        output.sum().backward()
        print("DGL CUDA kernels (SAGEConv) are working correctly.")
        print("\n✅ DGL has been successfully built and installed with CUDA support!")
  3. Run the script:
    python check.py

If all checks pass, you have officially conquered the build! You now have a DGL library fully optimized for your hardware. 🏆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment