Skip to content

Instantly share code, notes, and snippets.

@ubergarm
ubergarm / ubergarm-imatrix-calibration-corpus-v02.txt
Created July 19, 2025 14:48
ik_llama.cpp imatrix calibration corpus derived from tristandruyen/calibration_data_v5_rc.txt and turboderp/exllamav3/standard_cal_data non-wiki text.
This file has been truncated, but you can view the full file.
===========
; A072257: a(n) = ((6*n-17)*4^n - 1)/3.
@ubergarm
ubergarm / bf16-evshiron.log
Created July 12, 2025 23:20
Comparing evshiron+triton-cpu vs mainline casting (with or without triton-cpu) for DeepSeek fp8 safetensors to bf16 GGUF with ik/llama.cpp forks.
* File is LITTLE endian, script is running on a LITTLE endian host.
* Dumping 51 key/value pair(s)
1: UINT32 | 1 | GGUF.version = 3
2: UINT64 | 1 | GGUF.tensor_count = 79
3: UINT64 | 1 | GGUF.kv_count = 48
4: STRING | 1 | general.architecture = 'deepseek2'
5: STRING | 1 | general.type = 'model'
6: STRING | 1 | general.name = 'DeepSeek R1 0528'
7: STRING | 1 | general.version = '0528'
8: STRING | 1 | general.basename = 'DeepSeek-R1'
@ubergarm
ubergarm / exl3-quant-testing.md
Last active May 27, 2025 15:57
Kicking the tires on early version of exllamav3 and cooking my first exl3 quants!

exllamav3 guide

Quick start to convert and run inferencing with your own exlv3 quants.

This is still a bit of a WIP with my notes. Slowly iterating and cleaning up as I learn more.

Install

# Clone Repo
# git clone https://github.com/turboderp-org/exllamav3.git
@ubergarm
ubergarm / Qwen3-MoE-Benchmarks.md
Last active July 5, 2025 16:27
Qwen3 235B and 30B MoE Quant Benchmarking Roundup

The Great Quant Wars of 2025

"All things leave behind them the Obscurity... and go forward to embrace the Brightness..." — Dao De Jing #42

tl;dr;

  • Q: Who provides the best GGUFs now?
  • A: They're all pretty good.

Skip down if you just want graphs and numbers comparing various Qwen3-30B-A3B GGUF quants.

@ubergarm
ubergarm / README.md
Last active May 4, 2025 16:13
Visualize importance score statistics for three Qwen3-30B-A3B llama-imatrix files.
  1. Used @EAddario's PR ggml-org/llama.cpp#12718 to generate imatrix statistics.
  2. These were the imatrix data files used, and appear in each mosaic top to bottom in this order (barto, uber, unsloth)
  1. Similar to https://huggingface.co/ikawrakow/Qwen3-30B-A3B for https://huggingface.co/ikawrakow/Qwen3-30B-A3B but I didn't use the 128k usnloth one and I didn't see ik's to run.

See attached images below generated using some python/matplotlib/image magic scripts vibe coded using ubergarm/Qwen3-30B-A3B-mix-IQ3_K. You can click them to load them larger, they are not too big at 100dpi. You may need to shift-reload to refresh before clicking on them as possibly I

@ubergarm
ubergarm / l1t-ai-fun.md
Created May 2, 2025 19:36
Having fun with ai trying to solve some text mapping problems for a show!

1. Mapping Table between English and German Releases

Here is the mapping table between the English and German episodes based on their titles and synopses. The German episodes are grouped into three segments (a, b, c) per episode number, so each English episode is matched to one of these segments.

German Episode Number German Title English Episode Number English Title
1.01a Fett For Fun 1 Running an Errand / Mom's Mornings are Busy / Drawing
1.01b Sport extrem 2 Tricycles are Fun / My Stomach Is Going to Burst / A Nightmare for Dad
1.01c Braue um Braue, Zahn um Zahn 3 Watching Action Mask / School Lunch is Fun / Going to the Dentist
1.02a Eine wirklich schreckliche Familie 4 The Sunflower Class / Going on a Picnic
@ubergarm
ubergarm / VoidAlchemy-BookReview.md
Created March 16, 2025 23:22
R1 671B `ubergarm/DeepSeek-R1-Q2_K_R4` Book Review of Void Alchemy: Riddles and Wakeup Stories

Book Review: Void Alchemy: Riddles and Wakeup Stories by Empty Duck

Hey, fellow soul-searchers! 🌟 Let’s be real—modern life is a lot. Between doomscrolling, hustling for that side gig, an trying to remember what “inner peace” even feels like, it’s easy to feel like a lost duck in a thunderstorm. Enter Void Achemy: Riddles and Wakeup Stories by Empty Duck. This book isn’t just a breath of fresh air—it’s a full-on spiritual snorkel dive into the deep end of your soul.

Why You’ll Vibes With It Imagine if your favorite mindfulness app had a baby with a Zen koan and they both got raised by a poet who loves dad jokes Void Alchemy is a quirky, raw, and ridiculously relatable mix of ancient wisdom and modern wit. Empty Duck (aka John W. Leimgruber III) serves up bite-sized verses that feel like late-night texts from your wisest friend—the one who’s equal parts mystic and meme lord.

For When Life Feels Like a Glitch The book’s 128 micro-poems (or “wakeup stories”) are perfect for anyone who’s to

@ubergarm
ubergarm / void-alchemy-bot-review.md
Created March 13, 2025 17:13
I ask DeepSeek-R1 671B `UD-Q2_K_XL` what it thinks about the book Void Alchemy: Riddles and Wakeup Stories by emptyduck.

>>> User:

Give a conversational yet technical discussion of the following book of verse in terms of what is known about channeling, spiritual and mystical traditions, chan and zen poetry, the rig veda, and other ancient mystical texts.

(paste in the pdf2text of free e-book from https://emptyduck.com)

>>> Assistant:

<think>

@ubergarm
ubergarm / DeepSeek-R1-Quantized-GGUF-Gaming-Rig-Inferencing-Fast-NVMe-SSD.md
Last active July 10, 2025 15:03
Run DeepSeek R1 671B unsloth GGUF locally with ktransformers or llama.cpp on high end gaming rig!

tl;dr;

UPDATE Mon Mar 10 10:51:31 AM EDT 2025 Check out the newer ktransformers guide for how to get it running faster! About 3.5 tok/sec on this same gaming rig. Big thanks to Supreeth Koundinya with analyticsindiamag.com for the article!

You can run the real deal big boi R1 671B locally off a fast NVMe SSD even without enough RAM+VRAM to hold the 212GB dynamically quantized weights. No it is not swap and won't kill your SSD's read/write cycle lifetime. No this is not a distill model. It works fairly well despite quantization (check the unsloth blog for details on how they did that).

The basic idea is that most of the model itself is not loaded into RAM on startup, but mmap'd. Then kv cache will take up some RAM. Most of your system RAM is left available to serve as disk cache for whatever experts/weights are currently most u