Skip to content

Instantly share code, notes, and snippets.

@ubergarm
Last active May 4, 2025 16:13
Show Gist options
  • Save ubergarm/2aa9327f7b98a9b16fef62b4941c7e76 to your computer and use it in GitHub Desktop.
Save ubergarm/2aa9327f7b98a9b16fef62b4941c7e76 to your computer and use it in GitHub Desktop.
Visualize importance score statistics for three Qwen3-30B-A3B llama-imatrix files.
  1. Used @EAddario's PR ggml-org/llama.cpp#12718 to generate imatrix statistics.
  2. These were the imatrix data files used, and appear in each mosaic top to bottom in this order (barto, uber, unsloth)
  1. Similar to https://huggingface.co/ikawrakow/Qwen3-30B-A3B for https://huggingface.co/ikawrakow/Qwen3-30B-A3B but I didn't use the 128k usnloth one and I didn't see ik's to run.

See attached images below generated using some python/matplotlib/image magic scripts vibe coded using ubergarm/Qwen3-30B-A3B-mix-IQ3_K. You can click them to load them larger, they are not too big at 100dpi. You may need to shift-reload to refresh before clicking on them as possibly I attached them while this gist was being edited in private mode before making public.

attn_q

attn_q_mosaic

attn_k

attn_k_mosaic

attn_v

attn_v_mosaic

attn_output

attn_output_mosaic

ffn_gate_inp

ffn_gate_inp_mosaic

ffn_down_exps

ffn_down_exps_mosaic

ffn_gate_exps

ffn_gate_exps_mosaic

ffn_up_exps

ffn_up_exps_mosaic

output

output_mosaic (only ubergarm had the non-repeating output layer, probably because I used ik's fork to make the imatrix? I arbitrarily mapped it to layer "99" and the graph x-axis threw decimals but ignore that.)

@ubergarm
Copy link
Author

ubergarm commented May 4, 2025

Cross-referencing my comment that might shed some more light on potential effects: ggml-org/llama.cpp#13199 (comment)

fwiw my methodology for:

  1. llama-sweep-bench stuff is all in logs here
  2. My imatrix command is as follows running on ik_llama.cpp fork. Note I currently don't have enough VRAM+RAM to use bf16 as base for 235B, but use the bf16 for the 30B.
./build/bin/llama-imatrix \
    --verbosity 1 \
    --layer-similarity \
    -m /mnt/raid/models/ubergarm/Qwen3-235B-A22B-GGUF/Qwen3-235B-A22B-Q8_0.gguf \
    -f calibration_data_v5_rc.txt \
    -o /mnt/raid/models/ubergarm/Qwen3-235B-A22B-GGUF/imatrix-Qwen3-235B-A22B.dat \
    --ctx-size 512 \
    -ngl 34 \
    --threads 24

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment