use the x64 Native Tools Command Prompt for VS 2022
git clone https://github.com/facebookresearch/xformers.git
cd xformers
git submodule update --init --recursive
Find the CUDA compute capability Version of your GPU
Go to https://developer.nvidia.com/cuda-gpus
Note the Compute Capability Version. For example 7.5 for RTX 20xx
set TORCH_CUDA_ARCH_LIST=7.5
pip install -r requirements.txt
pip install -e .
or
python setup.py build python setup.py bdist_wheel
FAQ
lazy_init_num_threads linke error
Go to the file under \Lib\site-packages\torch\include\ATen\Parallel.h in your python environment and edit it by removing the inline implementation of at::internal::lazy_init_num_threads() leaving only the declaration. This will force your build to use non-inline import. The issue is most probably in the way VS C++ handled inline exports having static variables - in this case thread_local one.
Perfect solution of course is rebuilding pytorch together with your extension (as would be the case with any C++ DLL "exporting" classes without really taking care about doing it in a safe ABI compatible way) but patching the include file is also OK here, it will just prevent the compiler from inlining the API and creating a reference to the local static thread_local variable that fails linking step later. The compiler will then use non-inline variant from the pytorch DLL and linker error will be done away with.
can we have more details on how to set TORCH_CUDA_ARCH_LIST=7.5 ?