Skip to content

Instantly share code, notes, and snippets.

@jesuino
Last active February 6, 2026 01:51
Show Gist options
  • Select an option

  • Save jesuino/f3566c975a2395eea21cb4a8ff375e87 to your computer and use it in GitHub Desktop.

Select an option

Save jesuino/f3566c975a2395eea21cb4a8ff375e87 to your computer and use it in GitHub Desktop.
My llama.cpp commands to run locally my models. I should update it as I get better parameters

Hardware:

64 GB RAM - Linux (Fedora)

Qwen3VL-4B-Instruct-Q4_K_M

./llama-server -m /opt/models/Qwen3VL-4B-Instruct-Q4_K_M.gguf \
               --mmproj /opt/models/mmproj-Qwen3VL-4B-Instruct-Q8_0.gguf \
               --jinja  -b 1024 -t 8 -ngl 99 --temp 0.7  --top-k 20  \
               --top-p 0.8 --min-p 0.01 --repeat-penalty 1.05 --ctx-size 65536

Qwen3-Coder-Next-UD-Q2_K_XL.gguf

./llama-server -m /opt/models/Qwen3-Coder-Next-UD-Q2_K_XL.gguf
    --jinja --ctx-size 32768 \
    --temp 1.0 --top-p 0.95 --min-p 0.01 --top-k 40 --fit on

Qwen3-Coder-Next-Q2_K.gguf

./llama-server -m /opt/models/Qwen3-Coder-Next-Q2_K.gguf --jinja --ctx-size 16384     --temp 1.0 --top-p 0.95 --min-p 0.01 --top-k 40 --fit on

GLM-4.7-Flash-Q4_K_M.gguf

./llama-server -m /opt/models/GLM-4.7-Flash-Q4_K_M.gguf     --jinja --ctx-size 16384     --temp 1.0 --top-p 0.95 --min-p 0.01 --fit on 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment