- Used @EAddario's PR ggml-org/llama.cpp#12718 to generate imatrix statistics.
- These were the imatrix data files used, and appear in each mosaic top to bottom in this order (barto, uber, unsloth)
- https://huggingface.co/bartowski/Qwen_Qwen3-30B-A3B-GGUF/blob/main/Qwen_Qwen3-30B-A3B.imatrix
- https://huggingface.co/ubergarm/Qwen3-30B-A3B-GGUF/blob/main/Qwen3-30B-A3B-mix-IQ4_K.gguf
- https://huggingface.co/unsloth/Qwen3-30B-A3B-GGUF/blob/main/imatrix_unsloth.dat
- Similar to https://huggingface.co/ikawrakow/Qwen3-30B-A3B for https://huggingface.co/ikawrakow/Qwen3-30B-A3B but I didn't use the 128k usnloth one and I didn't see ik's to run.
See attached images below generated using some python/matplotlib/image magic scripts vibe coded using ubergarm/Qwen3-30B-A3B-mix-IQ3_K. You can click them to load them larger, they are not too big at 100dpi. You may need to shift-reload to refresh before clicking on them as possibly I attached them while this gist was being edited in private mode before making public.
(only ubergarm had the non-repeating output layer, probably because I used ik's fork to make the imatrix? I arbitrarily mapped it to layer "99" and the graph x-axis threw decimals but ignore that.)
I updated the gist to list the imatrix source in order top to bottom: bartowski, ubergarm, unsloth. If you click to enlarge an image the are labeled in the individual subtitles.
Though interestingly Dan (unsloth) seems to still be making more/new imatrix files for Qwen3-235B/30B MoE using longer context lengths than the default
-c 512
that I'm using.Using this recent note here on updated unsloth imatrix methodology and given they have access to 640GB VRAM machine that is enough to calculate imatrix on the 438G
Qwen3-235B-A22B-BF16
.So presumably the command JohannesGaessler is asking for here might be something like:
tbh I'm not sure how changing context from default of 512 to say 8k or 12k will effect PPL, KLD, benchmarks, and actual daily use for folks.