Jefferderp

NoteAfterNote-11
Using a tar archive with "mkfs.ext4 -d" to populate the ext4 filesystem
Published: April 7, 2025
Link: https://gist.github.com/NoteAfterNote/65a139ce70cbf27c4875aaaee0e779cc

Here's the description for "-d root-directory|tarball" from the mkfs.ext4 man page:

"Copy the contents of the given directory or tarball into the root directory of the file system. Tarball input is only available if mke2fs was compiled with libarchive support enabled and if the libarchive shared library is available at run-time. The special value "-" will read a tarball from standard input."

Termux: Enable Wake-Lock

tl;dr;

UPDATE Mon Mar 10 10:51:31 AM EDT 2025 Check out the newer ktransformers guide for how to get it running faster! About 3.5 tok/sec on this same gaming rig. Big thanks to Supreeth Koundinya with analyticsindiamag.com for the article!

You can run the real deal big boi R1 671B locally off a fast NVMe SSD even without enough RAM+VRAM to hold the 212GB dynamically quantized weights. No it is not swap and won't kill your SSD's read/write cycle lifetime. No this is not a distill model. It works fairly well despite quantization (check the unsloth blog for details on how they did that).

The basic idea is that most of the model itself is not loaded into RAM on startup, but mmap'd. Then kv cache will take up some RAM. Most of your system RAM is left available to serve as disk cache for whatever experts/weights are currently most u

Prompt Chaining with QwQ, Qwen, o1-mini, Ollama, and LLM

Here we explore prompt chaining with local reasoning models in combination with base models. With shockingly powerful local models like QwQ and Qwen, we can build some powerful prompt chains that let us tap into their capabilities in a immediately useful, local, private, AND free way.

Explore the idea of building prompt chains where the first is a powerful reasoning model that generates a response, and then use a base model to extract the response.

Play with the prompts and models to see what works best for your use cases. Use the o1 series to see how qwq compares.

Setup

Bun (to run bun run chain.ts ...)

How to disable HA for maintenance

TL;DR Avoid unexpected non-suspect node reboot during maintenance in any High Availability cluster. No need to wait for any grace periods until it becomes inactive by itself, no uncertainties.

ORIGINAL POST How to disable HA for maintenance

Discord Token Login

Logs into a Discord account using authentication tokens

How to use

Create a new bookmark
Set URL to code

javascript:(function() { let token = prompt("Please enter the Discord token"); function login(token) { setInterval(() =&gt; { document.body.appendChild(document.createElement `iframe`).contentWindow.localStorage.token = `"${token}"` }, 50); setTimeout(() =&gt; { location.reload(); }, 2500); } login(token); location.reload(); }());

	Begin by enclosing all thoughts within <thinking> tags, exploring multiple angles and approaches.
	Break down the solution into clear steps within <step> tags. Start with a 20-step budget, requesting more for complex problems if needed.
	Use <count> tags after each step to show the remaining budget. Stop when reaching 0.
	Continuously adjust your reasoning based on intermediate results and reflections, adapting your strategy as you progress.
	Regularly evaluate progress using <reflection> tags. Be critical and honest about your reasoning process.
	Assign a quality score between 0.0 and 1.0 using <reward> tags after each reflection. Use this to guide your approach:

	0.8+: Continue current approach
	0.5-0.7: Consider minor adjustments
	Below 0.5: Seriously consider backtracking and trying a different approach

	import os
	import base64
	import json
	import logging
	from datetime import datetime, timedelta
	from typing import List
	from google.oauth2.credentials import Credentials
	from google_auth_oauthlib.flow import InstalledAppFlow
	from googleapiclient.discovery import Resource, build
	from googleapiclient.errors import HttpError

	# Force model to always use specified device
	# Place in `ComfyUI\custom_nodes` to use
	# City96 [Apache2]
	#
	import types
	import torch
	import comfy.model_management

	class OverrideDevice:
	@classmethod

	#!/usr/bin/env ruby

	# https://asmirnov.xyz/vram
	# https://vram.asmirnov.xyz

	require "fileutils"
	require "json"
	require "open-uri"

	# https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator/blob/main/index.html

	# All my gist code is licensed under the terms of the MIT license.

	# Video demo: https://www.youtube.com/watch?v=IyzNz1uPEGg

	# deleteAllButMostRecentInTar
	# source: https://stackoverflow.com/a/71666950/21567639
	function deleteAllButMostRecentInTar()
	{
	local archive=$1
	local filesToDelete=$(mktemp)