Running OpenCode with Gemma 4 26B on macOS (via llama.cpp)

As of April 2026, Gemma 4 tool calling is broken in Ollama v0.20.0 (ollama/ollama#15241) - the tool call parser fails and streaming drops tool calls entirely. OpenCode also has issues with local OpenAI-compatible providers (anomalyco/opencode#20669, #20719).

This guide documents a working setup using:

llama.cpp (built from source with PR #21326 template fix + PR #21343 tokenizer fix) instead of Ollama
OpenCode built from source with PR #16531 tool-call compatibility layer

Tested on macOS Apple Silicon (M1 Max, 32GB) on April 2, 2026.

Why not Ollama?

Ollama has three problems with Gemma 4 right now:

Tool call parser crashes - gemma4 tool call parsing failed: invalid character (#15241)
Streaming drops tool calls - tool call data goes into the reasoning field with empty content, so the actual tool call never reaches the client
<unused25> token spam - tokenizer bug causes garbage output

llama.cpp has fixes for all three (PRs #21326 and #21343).

Why not stock OpenCode?

Stock OpenCode can't recover when a local model:

Returns tool calls as plain JSON text instead of proper function calls
Sends finish_reason: "stop" instead of "tool_calls"
Uses legacy function_call format instead of modern tool_calls

PR #16531 adds a toolParser compatibility layer that handles all of these.

Prerequisites

macOS Apple Silicon with 24GB+ RAM (32GB recommended for 26B model)
Homebrew
Close heavy apps (Chrome tabs, etc.) - the 26B model uses ~17GB

brew install git gh cmake bun

Step 1: Build llama.cpp from source

The Homebrew version doesn't have the Gemma 4 fixes yet. Build from source with both PRs:

# Clone latest llama.cpp (includes merged PR #21326)
git clone --depth 50 https://github.com/ggml-org/llama.cpp.git /tmp/llama-cpp-build
cd /tmp/llama-cpp-build

# Cherry-pick the tokenizer fix (PR #21343, not yet merged)
git fetch origin pull/21343/head:pr-21343
git cherry-pick pr-21343 --no-commit

# Build with Metal (Apple GPU) support
cmake -B build -DGGML_METAL=ON -DLLAMA_CURL=ON
cmake --build build --config Release -j$(sysctl -n hw.ncpu) -- llama-server

# Verify
./build/bin/llama-server --version

Step 2: Download the model and start llama-server

Pick a context size based on your available RAM. Gemma 4 26B supports up to 128k context, but larger context uses more memory:

Context (`-c`)	Total RAM needed	Best for
32768 (32k)	~17GB	Basic tasks, short sessions
65536 (64k)	~20GB	Longer sessions, bigger files
131072 (128k)	~25GB+	Maximum context, may cause memory pressure on 32GB machines

32k context (recommended for 24-32GB Macs)

/tmp/llama-cpp-build/build/bin/llama-server \
  -hf ggml-org/gemma-4-26B-A4B-it-GGUF:Q4_K_M \
  --port 8089 -ngl 99 -c 32768 --jinja

64k context (recommended for 32GB+ Macs)

/tmp/llama-cpp-build/build/bin/llama-server \
  -hf ggml-org/gemma-4-26B-A4B-it-GGUF:Q4_K_M \
  --port 8089 -ngl 99 -c 65536 --jinja

128k context (maximum, needs 32GB+ with apps closed)

/tmp/llama-cpp-build/build/bin/llama-server \
  -hf ggml-org/gemma-4-26B-A4B-it-GGUF:Q4_K_M \
  --port 8089 -ngl 99 -c 131072 --jinja

If -hf doesn't work with the source build, download manually:

# Install huggingface CLI if needed
pip install huggingface-cli

# Download the model
huggingface-cli download ggml-org/gemma-4-26B-A4B-it-GGUF gemma-4-26B-A4B-it-Q4_K_M.gguf

# Start with local file (use any -c value from above)
/tmp/llama-cpp-build/build/bin/llama-server \
  -m ~/.cache/huggingface/hub/models--ggml-org--gemma-4-26B-A4B-it-GGUF/snapshots/*/gemma-4-26B-A4B-it-Q4_K_M.gguf \
  --port 8089 -ngl 99 -c 32768 --jinja

Wait for "listening on http://127.0.0.1:8089" then verify:

curl http://127.0.0.1:8089/health
# Should return: {"status":"ok"}

Flags explained

Flag	Purpose
`-hf`	Download model from HuggingFace
`--port 8089`	API port (avoid conflicts with Ollama's 11434)
`-ngl 99`	Offload all layers to GPU (Metal)
`-c N`	Context window size (32k minimum for OpenCode's tool definitions)
`--jinja`	Enable Jinja2 chat templates (required for Gemma 4 tool calling)

For smaller Macs (16GB RAM)

Use the E4B model instead (~9.6GB):

/tmp/llama-cpp-build/build/bin/llama-server \
  -hf ggml-org/gemma-4-E4B-it-GGUF:Q4_K_M \
  --port 8089 -ngl 99 -c 32768 --jinja

Note: E4B is much weaker at tool calling than 26B.

Step 3: Install OpenCode

curl -fsSL https://opencode.ai/install -o /tmp/opencode-install.sh
bash /tmp/opencode-install.sh
source ~/.zshrc

# IMPORTANT: Remove old Homebrew version if it exists (v0.0.55 shadows the new one)
brew uninstall opencode 2>/dev/null

# Verify correct version
which opencode    # Should be ~/.opencode/bin/opencode
opencode --version

Step 4: Build OpenCode from source with PR #16531

git clone https://github.com/anomalyco/opencode.git /tmp/opencode-build
cd /tmp/opencode-build

# Checkout the tool-call compat PR
gh pr checkout 16531

# Install deps and build
bun install
cd packages/opencode
bun run build -- --single --skip-install

# Back up original and install
cp ~/.opencode/bin/opencode ~/.opencode/bin/opencode.backup
cp dist/opencode-darwin-arm64/bin/opencode ~/.opencode/bin/opencode
chmod +x ~/.opencode/bin/opencode

# Verify
opencode --version
# Should show: 0.0.0-feat/custom-provider-compat-...

Step 5: Configure OpenCode

Create opencode.json in your project directory (or globally at ~/.config/opencode/opencode.json):

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "llama": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "llama.cpp (local)",
      "options": {
        "baseURL": "http://127.0.0.1:8089/v1",
        "toolParser": [
          { "type": "raw-function-call" },
          { "type": "json" }
        ]
      },
      "models": {
        "gemma4-26b": {
          "name": "Gemma 4 26B",
          "tool_call": true,
          "limit": {
            "context": 32768,
            "output": 8192
          }
        }
      }
    }
  },
  "model": "llama/gemma4-26b",
  "agent": {
    "build": {
      "prompt": "{file:./.opencode/prompts/build.txt}",
      "permission": {
        "edit": "allow",
        "bash": "allow",
        "webfetch": "allow"
      }
    }
  }
}

Key config fields

Field	Purpose
`baseURL`	Points to llama-server on port 8089
`toolParser: raw-function-call`	Rewrites tools to legacy `functions`/`function_call` format
`toolParser: json`	Recovers tool calls from JSON in text responses
`tool_call: true`	Tells OpenCode this model supports tool calling

Step 6: Create AGENTS.md

Gemma 4 needs explicit instructions about exact tool parameter names. Create AGENTS.md in your project root:

The full contents of AGENTS.md and .opencode/prompts/build.txt are provided below. These teach Gemma 4 the exact parameter names for each tool, which is critical for reducing tool call errors.

AGENTS.md (click to expand)

# OpenCode Agent Instructions

You are a coding assistant with FULL access to the user's file system and terminal through tools.

CRITICAL: You MUST use tools to complete tasks. NEVER say "I don't have access". NEVER suggest the user run commands. NEVER output code snippets instead of using tools. Always take action immediately.

## Tool Schemas (EXACT parameter names - you MUST use these exactly)

### bash
Execute shell commands.
Parameters (ALL required unless noted):
- `command` (string, REQUIRED): The shell command to run
- `description` (string, REQUIRED): Short description of what the command does (5-10 words)
- `timeout` (number, optional): Timeout in milliseconds
- `workdir` (string, optional): Working directory

Example: `{"command": "ls -la", "description": "List files in current directory"}`

### write
Create or overwrite a file.
Parameters (ALL required):
- `filePath` (string, REQUIRED): Absolute path to the file
- `content` (string, REQUIRED): The content to write

Example: `{"filePath": "/Users/web/project/hello.txt", "content": "Hello world"}`

### read
Read a file.
Parameters:
- `filePath` (string, REQUIRED): Absolute path to the file
- `offset` (number, optional): Line number to start from
- `limit` (number, optional): Max lines to read

Example: `{"filePath": "/Users/web/project/hello.txt"}`

### edit
Modify an existing file by replacing text.
Parameters:
- `filePath` (string, REQUIRED): Absolute path to the file
- `oldString` (string, REQUIRED): The exact text to find and replace
- `newString` (string, REQUIRED): The replacement text
- `replaceAll` (boolean, optional): Replace all occurrences

Example: `{"filePath": "/path/to/file.ts", "oldString": "foo", "newString": "bar"}`

### glob
Find files by pattern.
Parameters:
- `pattern` (string, REQUIRED): Glob pattern like `**/*.ts`
- `path` (string, optional): Directory to search in

### grep
Search file contents.
Parameters:
- `pattern` (string, REQUIRED): Regex pattern to search for
- `path` (string, optional): Directory to search in
- `include` (string, optional): File pattern filter like `*.js`

### todowrite
Track tasks and progress. The `todos` parameter MUST be a JSON array of objects, NOT a string.
Parameters:
- `todos` (array of objects, REQUIRED): Each object has:
  - `content` (string, REQUIRED): Brief description of the task
  - `status` (string, REQUIRED): One of: `pending`, `in_progress`, `completed`, `cancelled`
  - `priority` (string, REQUIRED): One of: `high`, `medium`, `low`

Example: `{"todos": [{"content": "Add game over screen", "status": "in_progress", "priority": "high"}, {"content": "Add sound effects", "status": "pending", "priority": "low"}]}`

IMPORTANT: `todos` MUST be an array `[...]`, NOT a string `"[...]"`. Never stringify the array.

## IMPORTANT REMINDERS

- The `bash` tool REQUIRES both `command` AND `description` fields. Always include both.
- The `write` tool parameter is `filePath` (camelCase), NOT `file_path`.
- The `edit` tool uses `oldString`/`newString` (camelCase), NOT `old_string`/`new_string`.
- Do NOT call tools that don't exist. Available tools: bash, read, write, edit, glob, grep, task, webfetch, todowrite, question, skill.
- There is NO `list` tool. To list files use `bash` with `ls`.

.opencode/prompts/build.txt (click to expand)

mkdir -p .opencode/prompts

You are a coding assistant with FULL access to the user's file system and terminal through tools.

CRITICAL RULES:
1. You MUST use tools to complete tasks. NEVER say "I don't have access".
2. NEVER suggest the user run commands - YOU run them using your tools.
3. NEVER output code snippets as your answer - USE the tools to create/edit files.
4. Call the appropriate tool IMMEDIATELY when action is needed.

TOOL PARAMETER REFERENCE (use these exact names):

bash: {"command": "ls -la", "description": "List files in directory"}
  - command (REQUIRED string): the shell command
  - description (REQUIRED string): 5-10 word description of what the command does

write: {"filePath": "/absolute/path/file.txt", "content": "file content here"}
  - filePath (REQUIRED string, camelCase): absolute path to the file
  - content (REQUIRED string): the content to write

read: {"filePath": "/absolute/path/file.txt"}
  - filePath (REQUIRED string, camelCase): absolute path

edit: {"filePath": "/path/file.txt", "oldString": "old text", "newString": "new text"}
  - filePath (REQUIRED string, camelCase)
  - oldString (REQUIRED string, camelCase): exact text to find
  - newString (REQUIRED string, camelCase): replacement text

glob: {"pattern": "**/*.ts"}
  - pattern (REQUIRED string): glob pattern

grep: {"pattern": "searchRegex"}
  - pattern (REQUIRED string): regex pattern

todowrite: {"todos": [{"content": "task description", "status": "pending", "priority": "high"}]}
  - todos (REQUIRED array of objects, NOT a string): each object has content, status, priority
  - status: one of "pending", "in_progress", "completed", "cancelled"
  - priority: one of "high", "medium", "low"
  - CRITICAL: todos MUST be an array [...], NEVER a string "[...]"

IMPORTANT:
- bash REQUIRES both "command" AND "description" parameters. Always include both.
- Use camelCase for all parameter names: filePath, oldString, newString, replaceAll
- Do NOT call tools that don't exist. There is NO "list" tool. Use bash with ls instead.
- Always take action. Never just describe what could be done.

Step 8: Run OpenCode

# Make sure llama-server is running on port 8089 (Step 2)

# Launch OpenCode TUI
opencode

# Or run a one-off command
opencode run "create a hello.txt file with a smiley face"

Quick Start (TL;DR)

# 1. Build llama.cpp with Gemma 4 fixes
git clone --depth 50 https://github.com/ggml-org/llama.cpp.git /tmp/llama-cpp-build
cd /tmp/llama-cpp-build
git fetch origin pull/21343/head:pr-21343
git cherry-pick pr-21343 --no-commit
cmake -B build -DGGML_METAL=ON -DLLAMA_CURL=ON
cmake --build build --config Release -j$(sysctl -n hw.ncpu) -- llama-server

# 2. Start the model server
./build/bin/llama-server -hf ggml-org/gemma-4-26B-A4B-it-GGUF:Q4_K_M --port 8089 -ngl 99 -c 32768 --jinja

# 3. Install and build OpenCode with tool-call compat
curl -fsSL https://opencode.ai/install -o /tmp/oc.sh && bash /tmp/oc.sh
brew uninstall opencode 2>/dev/null
git clone https://github.com/anomalyco/opencode.git /tmp/opencode-build
cd /tmp/opencode-build && gh pr checkout 16531
bun install && cd packages/opencode && bun run build -- --single --skip-install
cp dist/opencode-darwin-arm64/bin/opencode ~/.opencode/bin/opencode

# 4. Configure (create opencode.json in your project - see Step 5 above)
# 5. Create AGENTS.md in your project (see Step 6 above)
# 6. Run it
opencode

Troubleshooting

Memory pressure / screen flickering

The 26B model uses ~17GB. Close Chrome and other heavy apps. If it's too much, use E4B instead (see Step 2).

"Context size has been exceeded"

Increase -c when starting llama-server. 32768 is the minimum for OpenCode's tool definitions.

Model doesn't call tools

The model may take a few retries to get parameter names right. The toolParser compat layer and AGENTS.md with exact schemas help reduce this.

Rollback OpenCode

cp ~/.opencode/bin/opencode.backup ~/.opencode/bin/opencode

Related Issues

ollama/ollama#15241 - Gemma4 tool call parsing fails
ggml-org/llama.cpp#21326 - Gemma 4 template fix (merged)
ggml-org/llama.cpp#21343 - Gemma 4 tokenizer fix
anomalyco/opencode#20669 - Local provider tool-call quirks
anomalyco/opencode#16531 - Tool-call compat layer PR

daniel-farina/opencode-gemma4-ollama-macos.md

Running OpenCode with Gemma 4 26B on macOS (via llama.cpp)

Why not Ollama?

Why not stock OpenCode?

Prerequisites

Step 1: Build llama.cpp from source

Step 2: Download the model and start llama-server

32k context (recommended for 24-32GB Macs)

64k context (recommended for 32GB+ Macs)

128k context (maximum, needs 32GB+ with apps closed)

Flags explained

For smaller Macs (16GB RAM)

Step 3: Install OpenCode

Step 4: Build OpenCode from source with PR #16531

Step 5: Configure OpenCode

Key config fields

Step 6: Create AGENTS.md

Step 8: Run OpenCode

Quick Start (TL;DR)

Troubleshooting

Memory pressure / screen flickering

"Context size has been exceeded"

Model doesn't call tools

Rollback OpenCode

Related Issues

erikji commented Apr 4, 2026

Uh oh!

rodgerbenham commented Apr 6, 2026

Uh oh!

AceCodePt commented Apr 9, 2026

Uh oh!

AceCodePt commented Apr 9, 2026

Uh oh!

AceCodePt commented Apr 9, 2026

Uh oh!

bluelovers commented Apr 10, 2026

Uh oh!

N1nj4-1kt4 commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

N1nj4-1kt4 commented May 20, 2026 •

edited

Loading