Skip to content

Instantly share code, notes, and snippets.

@njanirudh
Last active April 28, 2026 06:04
Show Gist options
  • Select an option

  • Save njanirudh/e20933f2fc0085543282dfd10e91bdac to your computer and use it in GitHub Desktop.

Select an option

Save njanirudh/e20933f2fc0085543282dfd10e91bdac to your computer and use it in GitHub Desktop.
LlamaCpp server.ini forAMD UMA Laptop
```
version = 1
[*]
host = 0.0.0.0
port = 8080
parallel = 1
no-webui = False
# Threads
threads = 8
threads-batch = 16
# Context & Cache
ctx-size = 128256
cache-type-k = f16
cache-type-v = f16
kv-unified = true
# Features
fit = true
reasoning = false
jinja = true
ngl = all
fa = on
split-mode = none
main-gpu = 0
mmap = on
mlock = on
gpu-layers = all
# ==========================
# QWEN 3.6
# ==========================
[Qwen3.6-35B-A3B-APEX-Heretic-I]
model = /LLMs/Models/Qwen3.6/Qwen3.6-35B-A3B-uncensored-heretic-APEX-I-Quality.gguf
mmproj = /LLMs/Models/mmproj/Qwen3.6_35B_A3B-mmproj-BF16.gguf
temperature=1.0
top-p = 0.95
top-k = 20
min-p = 0.0
presence-penalty = 0.0
chat-template-kwargs = {"preserve_thinking": true}
spec-type = ngram-mod
spec-ngram-size-n = 24
spec-ngram-size-m = 8
spec-ngram-min-hits = 2
draft-min = 48
draft-max = 64
reasoning-budget = 200
reasoning-budget-message = "\n I have all the details. Let me answer now."
# ==========================
# Gemma4
# ==========================
[Mythos-Gemma4-26B-A4B]
model = /LLMs/Models/mythos-26b-a4b-prism-pro-dq.gguf
mmproj = /LLMs/Models/mmproj/mmproj-mythos-26b-a4b-prism-pro.gguf
image-min-tokens = 560
image-max-tokens = 2240
```
@njanirudh
Copy link
Copy Markdown
Author

ROCm

services:
  llama-server:
    image: ghcr.io/ggml-org/llama.cpp:server-rocm
    container_name: llama-rocm
    stdin_open: true
    tty: true

    ports:
      - "8080:8080"

    devices:
      - /dev/kfd:/dev/kfd
      - /dev/dri:/dev/dri

    group_add:
      - video
      - render

    security_opt:
      - seccomp=unconfined

    environment:
      HSA_OVERRIDE_GFX_VERSION: "11.5.1"

    volumes:
      - /home/aja@phenospex.local/LLMs/:/LLMs

    command: >
      --models-preset /LLMs/server.ini

    restart: "no"

@njanirudh
Copy link
Copy Markdown
Author

Vulkan

services:
  llama-server:
    image: ghcr.io/ggml-org/llama.cpp:server-vulkan
    container_name: llama-full
    stdin_open: true
    tty: true
    ports:
      - "8080:8080"
      
    devices:
      - /dev/dri/renderD128:/dev/dri/renderD128
      - /dev/dri/card1:/dev/dri/card1
      
    volumes:
      - /usr/share/vulkan:/usr/share/vulkan:ro
      - /etc/vulkan:/etc/vulkan:ro
      - /home/aja@phenospex.local/LLMs/:/LLMs
      
    command: >
      --models-preset /LLMs/server.ini
      
    restart: "no"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment