https://github.com/jundot/omlx/releases
Go to model downloader
Multiple options, depending on your RAM
35B parameters with 3 billion active:
unsloth/Qwen3.6-35B-A3B-UD-MLX-3bit- 17.4 GB (36GB+ RAM ideal)unsloth/Qwen3.6-35B-A3B-UD-MLX-4bit- 21.6 GB (48GB+ RAM ideal)unsloth/Qwen3.6-35B-A3B-MLX-8bit- 37.7GB GB (64GB+ RAM ideal)
27B billion parameters
unsloth/Qwen3.6-27B-UD-MLX-4bit- 26.2GB (48GB+ RAM ideal)unsloth/Qwen3.6-27B-UD-MLX-6bit- 30.5GB (64GB+ RAM ideal)unsloth/Qwen3.6-27B-UD-MLX-8bit- 34.7GB (64GB+ RAM ideal)
- Go to model settings
- Pin and default model to the downloaded one
- Open the model's settings
- Enable
TurboQuant KV Cachein3.5-bit - Go to global settings
- Turn on
Fallback to Default Model - Set
Hot Cache Limit (In-Memory Cache)to 10% - Set
Cold Cache Limit (SSD Cache)to 10% - Increase
Max Context Windowto256000 - Increase
Max Tokensto64000 - Save
-
Add
"CLAUDE_CODE_ATTRIBUTION_HEADER": "0"inenvkey inside~/.claude/settings.json(Ref)Example:
{ "env": { "CLAUDE_CODE_ATTRIBUTION_HEADER": "0" } }
Thanks a LOT for this page. It's simply the most useful page I found to run LLM locally on my MacBookProd
As I have 128Gb I currently use "Qwen3.6-35B-A3B-bf16".
Let me try that with ClaudeCode on a few project for a few more days.
I will let you give my feedback on model / settings ...
Thanks a lot again