Running Claude Code with a local LLM

1. Download and install oMLX (macOS-native MLX server with smart caching)

https://github.com/jundot/omlx/releases

2. Download the model

Go to model downloader

Multiple options, depending on your RAM

35B parameters with 3 billion active:

unsloth/Qwen3.6-35B-A3B-UD-MLX-3bit - 17.4 GB (36GB+ RAM ideal)
unsloth/Qwen3.6-35B-A3B-UD-MLX-4bit - 21.6 GB (48GB+ RAM ideal)
unsloth/Qwen3.6-35B-A3B-MLX-8bit - 37.7GB GB (64GB+ RAM ideal)

27B billion parameters

unsloth/Qwen3.6-27B-UD-MLX-4bit - 26.2GB (48GB+ RAM ideal)
unsloth/Qwen3.6-27B-UD-MLX-6bit - 30.5GB (64GB+ RAM ideal)
unsloth/Qwen3.6-27B-UD-MLX-8bit - 34.7GB (64GB+ RAM ideal)

3. Configure oMLX settings

Go to model settings
Pin and default model to the downloaded one
Open the model's settings
Enable TurboQuant KV Cache in 3.5-bit
Go to global settings
Turn on Fallback to Default Model
Set Hot Cache Limit (In-Memory Cache) to 10%
Set Cold Cache Limit (SSD Cache) to 10%
Increase Max Context Window to 256000
Increase Max Tokens to 64000
Save

4. Configure Claude Code

Add "CLAUDE_CODE_ATTRIBUTION_HEADER": "0" in env key inside ~/.claude/settings.json (Ref)

Example:
```
  {
    "env": {
      "CLAUDE_CODE_ATTRIBUTION_HEADER": "0"
    }
  }
```

5. Configure oMLX Claude's settings

Go to Dashboard
Scroll down to Claude Code with oMLX
Set Qwen3.6-35B-A3B-UD-MLX-4bit for all three tiers
Enable Context Scaling for Claude Code and set Target Context Size to 64000
Run the displayed Command
Optional for lighter work: Add --bare

6. Share you findings and optimizations!

Hi
Using Qwen3.6-35B-A3B-bf16 I reach input 1539.3 tok/s and more than output 50 tok/s
I am using it as my main ClaudeCode LLM now
When doing bigger stuff I do the plan on ClaudeCode online and make it write the the implementation plan in markdown.
I give my local Claude the plan to implement.
It's fast enough for me.
I still have a lot of memory to use a lot of cache ( Hot Cache Limit (In-Memory Cache) 20% (25GB) )
I complement Claude with rtk to make it more efficient and make it also use Context7 to get latest docs/features/releases for my code.
My personnal code is too small to see if this will work on big entreprise monoliths tough.
So today I keep the small ClaudeCode 20$ subscription.

DiegoRBaquero/readme.md

Select an option

No results found

Select an option

No results found

Running Claude Code with a local LLM

1. Download and install oMLX (macOS-native MLX server with smart caching)

2. Download the model

3. Configure oMLX settings

4. Configure Claude Code

5. Configure oMLX Claude's settings

6. Share you findings and optimizations!

christ-off commented May 2, 2026

Uh oh!

kynrai commented May 2, 2026

Uh oh!

christ-off commented May 18, 2026

Uh oh!

christ-off commented May 18, 2026

Uh oh!