Skip to content

Instantly share code, notes, and snippets.

@Cdaprod
Created June 15, 2025 19:38
Show Gist options
  • Save Cdaprod/2120c2a4d1518f90cbae9721feaec467 to your computer and use it in GitHub Desktop.
Save Cdaprod/2120c2a4d1518f90cbae9721feaec467 to your computer and use it in GitHub Desktop.
Running dockerized `ghcr.io/ggml-org/llama.cpp` CUDA Server
docker run --gpus all --restart unless-stopped -d `
-v "B:\Models:/models" `
-p 8000:8000 `
ghcr.io/ggml-org/llama.cpp:server-cuda `
-m /models/llama-2-7b-chat.Q4_K_M.gguf `
--port 8000 --host 0.0.0.0 -n 512 --n-gpu-layers 35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment