Grab latest llama.cpp sources and build it:
git clone https ://github.com/ggml-org/llama.cpp/
cmake llama.cpp -B llama.cpp/build -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON
cmake --build llama.cpp/build --config Release -j --clean-first --target llama-cli llama-mtmd-cli llama-server llama-gguf-split
Get OpenCode from https://opencode.ai/ - they have install instructions
Put this in ~/.config/opencode/opencode.json:
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"llama.cpp": {
"npm": "@ai-sdk/openai-compatible",
"name": "llama-server (local)",
"options": {
"baseURL": "http://127.0.0.1:8888/v1"
},
"models": {
"GLM-4.7-flash": {
"name": "unsloth/GLM-4.7-Flash-GGUF:UD-Q4_K_XL",
"modalities": { "input": ["text"], "output": ["text"] },
"limit": {
"context": 64000,
"output": 65536
}
}
}
}
}
}
Fire up llama.cpp:
./llama.cpp/build/bin/llama-server -hf unsloth/GLM-4.7-Flash-GGUF:UD-Q4_K_XL --jinja --threads -1 --ctx-size 65000 --temp 0.7 --top-p 1.0 --min-p 0.01 --dry-multiplier 0.0 --fit off -fa auto --port 8888 --host 127.0.0.1 --no-op-offload --no-mmap
Start OpenCode, press "Ctrl-X, m", scroll to the bottom and select GLM 4.7 Flash.
Be the hero you want to be.