Skip to content

Instantly share code, notes, and snippets.

@mmgaggle
Last active June 9, 2026 22:44
Show Gist options
  • Select an option

  • Save mmgaggle/2ad0986e5b8383e4414f0acc1fa3f1f0 to your computer and use it in GitHub Desktop.

Select an option

Save mmgaggle/2ad0986e5b8383e4414f0acc1fa3f1f0 to your computer and use it in GitHub Desktop.
kimi-linear-48b-a3b-q6-k.md
---
config:
    xyChart:
        width: 900
        height: 500
    themeVariables:
        xyChart:
            plotColorPalette: "#3366cc, #cc3366"
---
xychart-beta
    title "kimi-linear 48B.A3B Q6_K @ gfx1151 — throughput vs context depth"
    x-axis "context depth (tokens)" [0, 16384, 32768, 49152, 65536, 81920, 98304, 114688]
    y-axis "t/s" 0 --> 650
    line [602.85, 392.18, 288.20, 228.47, 189.02, 161.29, 140.44, 124.52]
    line [54.96, 51.17, 47.88, 44.94, 42.31, 39.90, 37.95, 36.12]
Loading
(.venv) kyle@dev:~/src/llama.cpp$ ./build/bin/llama-bench -m kimi-Q6_K.gguf -ngl 99 --mmap 0 \
  -p 512 -n 128 -d 0-262144+16384
ggml_cuda_init: found 1 ROCm devices (Total VRAM: 122880 MiB):
  Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32, VRAM: 122880 MiB
| model                          |       size |     params | backend    | ngl | mmap |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ---: | --------------: | -------------------: |
| kimi-linear 48B.A3B Q6_K       |  37.59 GiB |    49.12 B | ROCm       |  99 |    0 |           pp512 |        602.85 ± 4.89 |
| kimi-linear 48B.A3B Q6_K       |  37.59 GiB |    49.12 B | ROCm       |  99 |    0 |           tg128 |         54.96 ± 0.11 |
| kimi-linear 48B.A3B Q6_K       |  37.59 GiB |    49.12 B | ROCm       |  99 |    0 |  pp512 @ d16384 |        392.18 ± 2.18 |
| kimi-linear 48B.A3B Q6_K       |  37.59 GiB |    49.12 B | ROCm       |  99 |    0 |  tg128 @ d16384 |         51.17 ± 0.05 |
| kimi-linear 48B.A3B Q6_K       |  37.59 GiB |    49.12 B | ROCm       |  99 |    0 |  pp512 @ d32768 |        288.20 ± 0.94 |
| kimi-linear 48B.A3B Q6_K       |  37.59 GiB |    49.12 B | ROCm       |  99 |    0 |  tg128 @ d32768 |         47.88 ± 0.04 |
| kimi-linear 48B.A3B Q6_K       |  37.59 GiB |    49.12 B | ROCm       |  99 |    0 |  pp512 @ d49152 |        228.47 ± 0.82 |
| kimi-linear 48B.A3B Q6_K       |  37.59 GiB |    49.12 B | ROCm       |  99 |    0 |  tg128 @ d49152 |         44.94 ± 0.03 |
| kimi-linear 48B.A3B Q6_K       |  37.59 GiB |    49.12 B | ROCm       |  99 |    0 |  pp512 @ d65536 |        189.02 ± 0.38 |
| kimi-linear 48B.A3B Q6_K       |  37.59 GiB |    49.12 B | ROCm       |  99 |    0 |  tg128 @ d65536 |         42.31 ± 0.04 |
| kimi-linear 48B.A3B Q6_K       |  37.59 GiB |    49.12 B | ROCm       |  99 |    0 |  pp512 @ d81920 |        161.29 ± 0.56 |
| kimi-linear 48B.A3B Q6_K       |  37.59 GiB |    49.12 B | ROCm       |  99 |    0 |  tg128 @ d81920 |         39.90 ± 0.02 |
| kimi-linear 48B.A3B Q6_K       |  37.59 GiB |    49.12 B | ROCm       |  99 |    0 |  pp512 @ d98304 |        140.44 ± 0.21 |
| kimi-linear 48B.A3B Q6_K       |  37.59 GiB |    49.12 B | ROCm       |  99 |    0 |  tg128 @ d98304 |         37.95 ± 0.02 |
| kimi-linear 48B.A3B Q6_K       |  37.59 GiB |    49.12 B | ROCm       |  99 |    0 | pp512 @ d114688 |        124.52 ± 0.14 |
| kimi-linear 48B.A3B Q6_K       |  37.59 GiB |    49.12 B | ROCm       |  99 |    0 | tg128 @ d114688 |         36.12 ± 0.03 |
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment