I've been playing with Home Assistant Voice and it's too slow on the hardware I've been using (mostly Raspberry Pis and similar SBCs). I saw Jeff Geerling had gotten external GPUs working on a Pi, so decided to have a go...
- Raspberry Pi 5 8GB
- PCIe to NVMe board
- NVMe to Oculink board
- External PCIe x16 board with Oculink
- Asus AMD RX480
- Corsair 550W PSU
- Flash latest Raspberry Pi OS Lite
- Edit
/boot/firmware/config.txt
to adddtparam=pciex1_gen=3
at the bottom - Depedencies:
sudo apt install -y vim git bc bison flex libssl-dev make libncurses-dev
- Compile patched
memcpy
wget https://gist.githubusercontent.com/Coreforge/91da3d410ec7eb0ef5bc8dee24b91359/raw/b4848d1da9fff0cfcf7b601713efac1909e408e8/memcpy_unaligned.c gcc -shared -fPIC -o memcpy.so memcpy_unaligned.c sudo mv memcpy.so /usr/local/lib/memcpy.so sudo vim /etc/ld.so.preload # Put the following line inside ld.so.preload: /usr/local/lib/memcpy.so
- Clone raspberry pi linux repo
git clone --depth=1 https://github.com/raspberrypi/linux && cd linux
- Apply patch
wget -O amdgpu-pi5.patch https://github.com/raspberrypi/linux/compare/rpi-6.6.y...Coreforge:linux:rpi-6.6.y-gpu.patch git apply -v amdgpu-pi5.patch
- Set up config for raspberry pi 5:
KERNEL=kernel_2712 make bcm2712_defconfig
- Configure kernel changes
make menuconfig
- Kernel Features > Page Size > 4 KB (for Box86 compatibility)
- Kernel Features > Kernel support for 32-bit EL0 > Fix up misaligned multi-word loads and stores in user space
- Kernel Features > Fix up misaligned loads and stores from userspace for 64bit code
- Device Drivers > Graphics support > AMD GPU (optionally SI/CIK support too)
- Device Drivers > Graphics support > Direct Rendering Manager (XFree86 4.1.0 and higher DRI support) > Force Architecture can write-combine memory
- Modify .config and set CONFIG_LOCALVERSION to your version (I appended -pi_gpu)
vim .config
- Compile the kernel:
make -j6 Image.gz modules dtbs
- Install the kernel:
sudo make -j6 modules_install sudo cp /boot/firmware/$KERNEL.img /boot/firmware/$KERNEL-backup.img sudo cp arch/arm64/boot/Image.gz /boot/firmware/$KERNEL.img sudo cp arch/arm64/boot/dts/broadcom/*.dtb /boot/firmware/ sudo cp arch/arm64/boot/dts/overlays/*.dtb* /boot/firmware/overlays/ sudo cp arch/arm64/boot/dts/overlays/README /boot/firmware/overlays/
- Reboot:
sudo reboot
- Install graphics drivers and other tools:
sudo apt install -y firmware-amd-graphics mesa-utils mesa-va-drivers vainfo nvtop curl -LO https://github.com/Umio-Yasuno/amdgpu_top/releases/download/v0.10.3/amdgpu-top_0.10.3-1_arm64.deb sudo dpkg -i amdgpu-top_0.10.3-1_arm64.deb cd /usr/lib/firmware/amdgpu sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/psp_13_0_10_sos.bin & \ sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/smu_13_0_10.bin & \ sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/gc_11_0_3_pfp.bin & \ sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/gc_11_0_3_mes_2.bin & \ sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/gc_11_0_3_mes1.bin & \ sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/psp_13_0_10_ta.bin & \ sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/gc_11_0_3_me.bin & \ sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/gc_11_0_3_rlc.bin & \ sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/gc_11_0_3_mec.bin & \ sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/gc_11_0_3_imu.bin & \ sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/sdma_6_0_3.bin
- Set up llama for testing using the GPU works:
cd ~ git clone https://github.com/ggerganov/llama.cpp cmake -B build -DGGML_VULKAN=1 cmake --build build --config Release cd models && wget https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/resolve/main/Llama-3.2-3B-Instruct-Q4_K_M.gguf cd .. # Run llama ./build/bin/llama-cli -m "models/Llama-3.2-3B-Instruct-Q4_K_M.gguf" -p "Why is the blue sky blue?" -e -ngl 100 -t 4
- Set up whisper.cpp
sudo apt install glslang-tools libvulkan-dev glslc cmake git clone https://github.com/ggerganov/whisper.cpp.git cd whisper.cpp/ sh ./models/download-ggml-model.sh large-v2 cmake -B build -DGGML_VULKAN=1 cmake --build build -j --config Release # Confirm it works ./build/bin/whisper-cli -m ./models/ggml-large-v2.bin -f samples/jfk.wav # Run a server to connect to wyoming ./build/bin/whisper-server -m ./models/ggml-large-v2.bin --host 0.0.0.0 --port 8910 --print-realtime --print-progress
- Set up wyoming-whipser-api-client
git clone https://github.com/ser/wyoming-whisper-api-client cd wyoming-whisper-api-client script/setup ./script/run --uri tcp://0.0.0.0:7891 --debug --api http://127.0.0.1:8910/inference
- Home Assistant can now use http://raspberry-pi-ip:7891/ for speech to text
- Not included here, setting up systemd to run these processes, hardening and stability changes, etc.
- Use an External GPU on Raspberry Pi 5 for 4K Gaming - Jeff Geerling (Blog)
- LLMs accelerated with eGPU on a Raspberry Pi 5 - Jeff Geerling (Blog)
- Test AMD Radeon Pro W7700 & RX 7700 XT GPUs - Kernel build instructions - Jeff Geerling (GitHub)
- Test GPU (XFX AMD Radeon RX 460 4GB GDDR5) - Similar card testing - Jeff Geerling (GitHub)
- linux - Linux Kernel patches - Coreforge (GitHub)
- memcpy_unaligned.c - Coreforge (GitHub)
- The Linux kernel - Compile instructions - Raspberry Pi (Docs)
- A GPU-powered Pi for more efficient AI? - Jeff Geerling (YouTube)
- 4K Gaming... on Raspberry Pi! - Jeff Geerling (YouTube)