ZLUDA メモ

動作保証できかねます。使用は自己責任で。

本家

https://github.com/vosen/ZLUDA

フォーク（スクリプトから使用）

https://github.com/lshqqytiger/ZLUDA

使用例（フォークのメンテナーが ZLUDA を担当）

https://note.com/7shi/n/n718f81f50bba

準備

AMD の HIP でサポートされた GPU を用意して、HIP SDK 6.2 と uv をインストールします。

AMD HIP SDK for Windows

winget install -e --id=astral-sh.uv

HIP SDK の場所を環境変数で指定して、PATH を通します。

set HIP_PATH=C:\Program Files\AMD\ROCm\6.2\
set PATH=%HIP_PATH%bin;%PATH%

作業用のディレクトリを作って、Python の仮想環境を構築します。

uv venv --python 3.10 zluda
zluda\Scripts\activate.bat

バージョンを指定して torch をインストールします。

pip install torch==2.3.0 torchvision --index-url https://download.pytorch.org/whl/cu118

※ torch のバージョンを上げることは可能ですが、diffusers での画像生成が遅くなります。

SD.Next から以下のファイルをダウンロードして、作業ディレクトリ直下に入れます。

https://github.com/vladmandic/sdnext/blob/8330052d19df91416405ed2878b909ebc3f9ad6e/installer.py

modules ディレクトリを作成します。

md modules

SD.Next から以下のファイルをダウンロードして、modules ディレクトリに入れます。

作業ディレクトリ直下に以下の内容で zluda.py を作成します。（コメントは引用元）

# https://github.com/vladmandic/sdnext/blob/master/cli/zluda-python.py

from modules import zluda_installer
zluda_installer.install()
zluda_installer.load()

# https://github.com/lshqqytiger/ZLUDA

import os

os.environ["DISABLE_ADDMM_CUDA_LT"] = "1"

import torch

torch.backends.cudnn.enabled = False
torch.backends.cuda.enable_flash_sdp(False)
torch.backends.cuda.enable_math_sdp(True)
torch.backends.cuda.enable_mem_efficient_sdp(False)
torch.backends.cuda.enable_cudnn_sdp(False)

_topk = torch.topk
def topk(tensor: torch.Tensor, *args, **kwargs):
    device = tensor.device
    values, indices = _topk(tensor.cpu(), *args, **kwargs)
    return torch.return_types.topk((values.to(device), indices.to(device),))
torch.topk = topk

これで torch が ZLUDA 対応になります。

使用方法

zluda を torch より先に import します。

Python の REPL での使用例です。

>>> import zluda, torch
>>> torch.cuda.is_available()
True
>>> torch.cuda.get_device_name(0)
'AMD Radeon RX 7600 XT [ZLUDA]'

diffusers

定番のベンチマークで、ZLUDA による画像生成の速度を計測します。

https://note.com/mayu_hiraizumi/n/ne25cc6e963b3

必要なライブラリをインストールします。

pip install diffusers "transformers<4.52" accelerate "numpy<2"

※ transformers のバージョンを指定しているのは、4.52 以降での pickle 形式の制限を回避するため。

以下のモデルファイルを使用します。

https://huggingface.co/terracottahaniwa/nai-anime-v1-full/blob/main/nai-anime-v1-full.safetensors

import zluda, torch, diffusers
pipe = diffusers.StableDiffusionPipeline.from_single_file(
    "nai-anime-v1-full.safetensors", torch_dtype=torch.float16)
pipe.scheduler = diffusers.EulerDiscreteScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")

from datetime import datetime
start = datetime.now()
for i in range(10):
    torch.manual_seed(2870305590)
    result = pipe(
        prompt = "masterpiece, best quality, masterpiece, asuka langley sitting cross legged on a chair",
        negative_prompt = "lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts,signature, watermark, username, blurry, artist name",
        width = 512, height = 512, num_inference_steps = 28, guidance_scale = 12, clip_skip = 2)
result.images[0].save(f"hello-asuka.png")
print(datetime.now() - start)

初回起動時はコンパイルが行われるため時間がかかります。

Compilation is in progress. Please wait...

※ コンパイル結果は zluda.db にキャッシュされます。場所の探し方を示します。

dir /s %LOCALAPPDATA%\zluda.db

何回か実行すればコンパイルが収まって安定します。所要時間の例を示します。

GPU: Radeon RX 7600 XT
PyTorch: 2.3.0+cu118
ZLUDA: 3.9.5

回目	所要時間	it/s (最速)
1	0:15:43.734557	30.91
2	0:00:56.038026	31.60
3	0:00:51.534825	31.93
4	0:00:50.098944	31.37
5	0:00:49.529691	31.64

※ 1 回目はコンパイルのため極端に時間が掛かります。

同じ環境で ROCm 版 PyTorch を利用した場合、約 1.5 倍ほど高速です。

回目	所要時間	it/s (最速)
1	0:00:34.662402	40.08
2	0:00:31.590820	39.83
3	0:00:31.709761	40.32
4	0:00:31.732605	40.29
5	0:00:31.814152	39.76

これがどの程度の性能を表すかは、以下を参照してください。

https://chimolog.co/bto-gpu-stable-diffusion-specs/

参考

ROCm 版 PyTorch については以下を参照してください。

https://qiita.com/7shi/items/6a68a49629c463bc90f7

7shi/ZLUDA.md

ZLUDA メモ

準備

使用方法

diffusers

参考