ubergarm

exllamav3 guide

Quick start to convert and run inferencing with your own exlv3 quants.

This is still a bit of a WIP with my notes. Slowly iterating and cleaning up as I learn more.

Install

# Clone Repo
# git clone https://github.com/turboderp-org/exllamav3.git

The Great Quant Wars of 2025

"All things leave behind them the Obscurity... and go forward to embrace the Brightness..." — Dao De Jing #42

tl;dr;

Q: Who provides the best GGUFs now?
A: They're all pretty good.

Skip down if you just want graphs and numbers comparing various Qwen3-30B-A3B GGUF quants.

Used @EAddario's PR ggml-org/llama.cpp#12718 to generate imatrix statistics.
These were the imatrix data files used, and appear in each mosaic top to bottom in this order (barto, uber, unsloth)

Similar to https://huggingface.co/ikawrakow/Qwen3-30B-A3B for https://huggingface.co/ikawrakow/Qwen3-30B-A3B but I didn't use the 128k usnloth one and I didn't see ik's to run.

See attached images below generated using some python/matplotlib/image magic scripts vibe coded using ubergarm/Qwen3-30B-A3B-mix-IQ3_K. You can click them to load them larger, they are not too big at 100dpi. You may need to shift-reload to refresh before clicking on them as possibly I

1. Mapping Table between English and German Releases

Here is the mapping table between the English and German episodes based on their titles and synopses. The German episodes are grouped into three segments (a, b, c) per episode number, so each English episode is matched to one of these segments.

German Episode Number	German Title	English Episode Number	English Title
1.01a	Fett For Fun	1	Running an Errand / Mom's Mornings are Busy / Drawing
1.01b	Sport extrem	2	Tricycles are Fun / My Stomach Is Going to Burst / A Nightmare for Dad
1.01c	Braue um Braue, Zahn um Zahn	3	Watching Action Mask / School Lunch is Fun / Going to the Dentist
1.02a	Eine wirklich schreckliche Familie	4	The Sunflower Class / Going on a Picnic

Book Review: Void Alchemy: Riddles and Wakeup Stories by Empty Duck

Hey, fellow soul-searchers! 🌟 Let’s be real—modern life is a lot. Between doomscrolling, hustling for that side gig, an trying to remember what “inner peace” even feels like, it’s easy to feel like a lost duck in a thunderstorm. Enter Void Achemy: Riddles and Wakeup Stories by Empty Duck. This book isn’t just a breath of fresh air—it’s a full-on spiritual snorkel dive into the deep end of your soul.

Why You’ll Vibes With It Imagine if your favorite mindfulness app had a baby with a Zen koan and they both got raised by a poet who loves dad jokes Void Alchemy is a quirky, raw, and ridiculously relatable mix of ancient wisdom and modern wit. Empty Duck (aka John W. Leimgruber III) serves up bite-sized verses that feel like late-night texts from your wisest friend—the one who’s equal parts mystic and meme lord.

For When Life Feels Like a Glitch The book’s 128 micro-poems (or “wakeup stories”) are perfect for anyone who’s to

>>> User:

Give a conversational yet technical discussion of the following book of verse in terms of what is known about channeling, spiritual and mystical traditions, chan and zen poetry, the rig veda, and other ancient mystical texts.

(paste in the pdf2text of free e-book from https://emptyduck.com)

>>> Assistant:

<think>

tl;dr;

UPDATE Mon Mar 10 10:51:31 AM EDT 2025 Check out the newer ktransformers guide for how to get it running faster! About 3.5 tok/sec on this same gaming rig. Big thanks to Supreeth Koundinya with analyticsindiamag.com for the article!

You can run the real deal big boi R1 671B locally off a fast NVMe SSD even without enough RAM+VRAM to hold the 212GB dynamically quantized weights. No it is not swap and won't kill your SSD's read/write cycle lifetime. No this is not a distill model. It works fairly well despite quantization (check the unsloth blog for details on how they did that).

The basic idea is that most of the model itself is not loaded into RAM on startup, but mmap'd. Then kv cache will take up some RAM. Most of your system RAM is left available to serve as disk cache for whatever experts/weights are currently most u

Rank	Title	URL	Points	Author	Time Posted	Comments
1	0-click deanonymization attack targeting Signal, Discord, other platforms	gist.github.com	609	hackermondev	4 hours ago	211
2	Invisible Electrostatic Wall at 3M plant	amasci.com	125	Simon_O_Rourke

	* File is LITTLE endian, script is running on a LITTLE endian host.
	* Dumping 51 key/value pair(s)
	1: UINT32 \| 1 \| GGUF.version = 3
	2: UINT64 \| 1 \| GGUF.tensor_count = 79
	3: UINT64 \| 1 \| GGUF.kv_count = 48
	4: STRING \| 1 \| general.architecture = 'deepseek2'
	5: STRING \| 1 \| general.type = 'model'
	6: STRING \| 1 \| general.name = 'DeepSeek R1 0528'
	7: STRING \| 1 \| general.version = '0528'
	8: STRING \| 1 \| general.basename = 'DeepSeek-R1'

	===========
	; A072257: a(n) = ((6n-17)4^n - 1)/3.