I've been playing with Home Assistant Voice and it's too slow on the hardware I've been using (mostly Raspberry Pis and similar SBCs). I saw Jeff Geerling had gotten external GPUs working on a Pi, so decided to have a go...

Hardware

Raspberry Pi 5 8GB
PCIe to NVMe board
NVMe to Oculink board
External PCIe x16 board with Oculink
Asus AMD RX480
Corsair 550W PSU

Software

Flash latest Raspberry Pi OS Lite
Edit /boot/firmware/config.txt to add dtparam=pciex1_gen=3 at the bottom
Depedencies: sudo apt install -y vim git bc bison flex libssl-dev make libncurses-dev

Compile patched memcpy

 wget https://gist.githubusercontent.com/Coreforge/91da3d410ec7eb0ef5bc8dee24b91359/raw/b4848d1da9fff0cfcf7b601713efac1909e408e8/memcpy_unaligned.c

 gcc -shared -fPIC -o memcpy.so memcpy_unaligned.c
 sudo mv memcpy.so /usr/local/lib/memcpy.so
 sudo vim /etc/ld.so.preload

 # Put the following line inside ld.so.preload:
 /usr/local/lib/memcpy.so

Clone raspberry pi linux repo git clone --depth=1 https://github.com/raspberrypi/linux && cd linux

Apply patch

  wget -O amdgpu-pi5.patch https://github.com/raspberrypi/linux/compare/rpi-6.6.y...Coreforge:linux:rpi-6.6.y-gpu.patch
  git apply -v amdgpu-pi5.patch

Set up config for raspberry pi 5:

  KERNEL=kernel_2712
make bcm2712_defconfig

Configure kernel changes make menuconfig
- Kernel Features > Page Size > 4 KB (for Box86 compatibility)
- Kernel Features > Kernel support for 32-bit EL0 > Fix up misaligned multi-word loads and stores in user space
- Kernel Features > Fix up misaligned loads and stores from userspace for 64bit code
- Device Drivers > Graphics support > AMD GPU (optionally SI/CIK support too)
- Device Drivers > Graphics support > Direct Rendering Manager (XFree86 4.1.0 and higher DRI support) > Force Architecture can write-combine memory
Modify .config and set CONFIG_LOCALVERSION to your version (I appended -pi_gpu) vim .config
Compile the kernel: make -j6 Image.gz modules dtbs

Install the kernel:

 sudo make -j6 modules_install
 sudo cp /boot/firmware/$KERNEL.img /boot/firmware/$KERNEL-backup.img
 sudo cp arch/arm64/boot/Image.gz /boot/firmware/$KERNEL.img
 sudo cp arch/arm64/boot/dts/broadcom/*.dtb /boot/firmware/
 sudo cp arch/arm64/boot/dts/overlays/*.dtb* /boot/firmware/overlays/
 sudo cp arch/arm64/boot/dts/overlays/README /boot/firmware/overlays/

Reboot: sudo reboot

Install graphics drivers and other tools:

  sudo apt install -y firmware-amd-graphics mesa-utils mesa-va-drivers vainfo nvtop 
  curl -LO https://github.com/Umio-Yasuno/amdgpu_top/releases/download/v0.10.3/amdgpu-top_0.10.3-1_arm64.deb
  sudo dpkg -i amdgpu-top_0.10.3-1_arm64.deb
  cd /usr/lib/firmware/amdgpu
  sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/psp_13_0_10_sos.bin & \
  sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/smu_13_0_10.bin & \
  sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/gc_11_0_3_pfp.bin & \
  sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/gc_11_0_3_mes_2.bin & \
  sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/gc_11_0_3_mes1.bin & \
  sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/psp_13_0_10_ta.bin & \
  sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/gc_11_0_3_me.bin & \
  sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/gc_11_0_3_rlc.bin & \
  sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/gc_11_0_3_mec.bin & \
  sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/gc_11_0_3_imu.bin & \
  sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/sdma_6_0_3.bin

Set up llama for testing using the GPU works:

  cd ~
  git clone https://github.com/ggerganov/llama.cpp
  cmake -B build -DGGML_VULKAN=1
  cmake --build build --config Release
  cd models && wget https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/resolve/main/Llama-3.2-3B-Instruct-Q4_K_M.gguf
  cd ..
  # Run llama
  ./build/bin/llama-cli -m "models/Llama-3.2-3B-Instruct-Q4_K_M.gguf" -p "Why is the blue sky blue?" -e -ngl 100 -t 4

Home Assistant Speech To Text (Whisper)

Set up whisper.cpp

sudo apt install glslang-tools libvulkan-dev glslc cmake
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp/
sh ./models/download-ggml-model.sh large-v2
cmake -B build -DGGML_VULKAN=1
cmake --build build -j --config Release
# Confirm it works
./build/bin/whisper-cli -m ./models/ggml-large-v2.bin -f samples/jfk.wav
# Run a server to connect to wyoming
./build/bin/whisper-server -m ./models/ggml-large-v2.bin --host 0.0.0.0 --port 8910 --print-realtime --print-progress

Set up wyoming-whipser-api-client

git clone https://github.com/ser/wyoming-whisper-api-client
cd wyoming-whisper-api-client
script/setup
./script/run --uri tcp://0.0.0.0:7891 --debug --api http://127.0.0.1:8910/inference

Home Assistant can now use http://raspberry-pi-ip:7891/ for speech to text
Not included here, setting up systemd to run these processes, hardening and stability changes, etc.

References

Use an External GPU on Raspberry Pi 5 for 4K Gaming - Jeff Geerling (Blog)
LLMs accelerated with eGPU on a Raspberry Pi 5 - Jeff Geerling (Blog)
Test AMD Radeon Pro W7700 & RX 7700 XT GPUs - Kernel build instructions - Jeff Geerling (GitHub)
Test GPU (XFX AMD Radeon RX 460 4GB GDDR5) - Similar card testing - Jeff Geerling (GitHub)
linux - Linux Kernel patches - Coreforge (GitHub)
memcpy_unaligned.c - Coreforge (GitHub)
The Linux kernel - Compile instructions - Raspberry Pi (Docs)
A GPU-powered Pi for more efficient AI? - Jeff Geerling (YouTube)
4K Gaming... on Raspberry Pi! - Jeff Geerling (YouTube)

mgarratt/README.md

Hardware

Software

Home Assistant Speech To Text (Whisper)

References