Skip to content

Instantly share code, notes, and snippets.

@vitorcalvi
Last active February 6, 2025 17:07
Show Gist options
  • Save vitorcalvi/24c682d85695e96f63642acbddd891fe to your computer and use it in GitHub Desktop.
Save vitorcalvi/24c682d85695e96f63642acbddd891fe to your computer and use it in GitHub Desktop.
## Max Token
python -m llama_cpp.server --model DeepSeek-R1-Distill-Qwen-1.5B-Q4_1.gguf --host 0.0.0.0 --n_threads 8 --n_batch 512 --n_gpu_layers 0 --n_ctx 2048 --mul_mat_q 1
# Balanced
python -m llama_cpp.server --model DeepSeek-R1-Distill-Qwen-1.5B-Q4_1.gguf --host 0.0.0.0 --n_threads 8 --n_batch 32 --n_gpu_layers 0 --n_ctx 512 --mul_mat_q 1 --offload_kqv 1
## VULKAN
./llama-cli -m ../../../models/DeepSeek-R1-Distill-Llama-8B-Q8_0.gguf --gpu-layers 24 --gpu-precision fp16 --ctx-size 2048 --batch-size 512 --threads 4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment