Skip to content

Instantly share code, notes, and snippets.

View CAFxX's full-sized avatar

Carlo Alberto Ferraris CAFxX

View GitHub Profile
@CAFxX
CAFxX / prompt.txt
Last active April 17, 2025 06:26
gemini 2.5 prompt?
As part of a reasoning step, Gemini 2.5 Pro Preview 03-25 randomly blurted out:
---
If multiple possible answers are available in the sources, present all possible answers.
If the question has multiple parts or covers various aspects, ensure that you answer them all to the best of your ability.
When answering questions, aim to give a thorough and informative answer, even if doing so requires expanding beyond the specific inquiry from the user.
If the question is time dependent, use the current date to provide most up to date information.
If you are asked a question in a language other than English, try to answer the question in that language.
Rephrase the information instead of just directly copying the information from the sources.
@CAFxX
CAFxX / sort.cpp
Last active March 6, 2025 08:58
Sorting network for short arrays, branchless (CMOVxx, VMINSS/VMAXSS) - https://godbolt.org/z/v4G4xPofc
template <typename T>
static void sort(T* a, int l) {
#define S(i, j) { \
T t1 = a[i], t2 = a[j]; \
if (t1 > t2) { T t = t1; t1 = t2; t2 = t; } \
a[i] = t1, a[j] = t2;\
}
// Sorting networks from https://bertdobbelaere.github.io/sorting_networks.html
// Using the ones with lower CEs because every S(...) requires two CMOVxx or a
// VMINSS+VMAXSS pair, and it seems that's the limit per cycle on current
@CAFxX
CAFxX / sorter.md
Last active March 6, 2025 09:02
Minimal latency N-way sorter

Minimal Latency N-way Sorter (MLNS)

A MLNS is a $n$-sorter circuit that returns $N$ input values, sorted. It is functionally equivalent to a [sorting network][1] with $N$ inputs.

For small values of $N$ (3, 4, and possibly also 5) it can make sense, in order to minimize latency at the cost of slightly increased die area and gate count, to not use a traditional [sorting network][1] and instead perform all comparisons in parallel in a single stage, and then select the correct ordering, combinatorially, using the outputs of all comparisons.

Alternatively, an MLNS can also be used as the foundational building block of a $M$-inputs sorting network (with $M&gt;N$) to reduce the number of stages (and therefore latency) of the sorting network.

@CAFxX
CAFxX / count_digits.c
Last active January 10, 2025 07:26
Fast count decimal digits (branchless)
/*
Fast, branchless count of decimal digits in a uint64
(C) 2025 Carlo Alberto Ferraris (CAFxX)
This compiles down on x86-64 to something like
lzcnt rcx, rdi
lea rax, [rip + countDigits.lut1]
movzx eax, byte ptr [rcx + rax]
lea rdx, [rip + countDigits.lut2]
@CAFxX
CAFxX / gomaxprocs.go
Last active October 29, 2024 03:15
Lock-free, fast GOMAXPROCS(0)
package xruntime
import (
"runtime"
"sync"
"sync/atomic"
"time"
)
var gmp atomic.Int32
@CAFxX
CAFxX / memchrs.c
Last active October 24, 2024 08:25
memchrs
// https://godbolt.org/z/63Ebd37vz
#include <immintrin.h>
#include <stdint.h>
#include <string.h>
void* memchrs(const void* haystack, int len, const char* needles, int n) {
if (len <= 0 || n <= 0) {
return NULL;
}
@CAFxX
CAFxX / write.go
Created August 26, 2024 06:49
Generic `io.Write`
package io
import (
"iter"
"math"
)
type Writable interface {
~string | ~[]byte | ~byte | ~rune | ~[]rune |
iter.Seq[byte] | iter.Seq[rune] | iter.Seq2[byte, error] | iter.Seq2[rune, error] |
@CAFxX
CAFxX / with_value_func.go
Created August 6, 2024 04:51
context.WithValueFunc(...)
package context
import (
"context"
"sync"
)
func WithValueFunc(ctx context.Context, key any, valFn func() any) context.Context {
return &valFunc{Context: ctx, key: key, valFn: valFn}
}
@CAFxX
CAFxX / textproto.go
Last active September 26, 2023 01:49
textproto.CanonincalMIMEHeaderKey with memoization and GC
package textproto
import (
"net/textproto"
"runtime"
"sync"
)
// CanonincalMIMEHeaderKey is like textproto.CanonicalMIMEHeaderKey but it
// memoizes results to avoid repeated allocations of the same string.
package maps
type ReadMostlyMap[K comparable, V any] struct {
mu sync.Mutex
m atomic.Pointer // map[K]V
}
func map2ptr[K comparable, V any](m map[K]V) unsafe.Pointer {
im := any(m)
return *(*unsafe.Pointer)(unsafe.Pointer(&im))