Tip
Skip to the bottom of this document for a TL;DR
For more info, see llama.cpp #12511: Handle user-defined quantization levels for additional tensors by @EAddario
Testing done by @ddh0 using this branch as of committ 5a304b8. Using libllama built for Linux CUDA.