Skip to content

Instantly share code, notes, and snippets.

View nmoinvaz's full-sized avatar

Nathan Moinvaziri nmoinvaz

  • Phoenix, United States
View GitHub Profile
@nmoinvaz
nmoinvaz / zlibng-vs-zlibrs-benchmark.md
Last active February 26, 2026 20:28
zlib-ng vs zlib-rs benchmark comparison on Apple M3 (ARM64)

zlib-ng vs zlib-rs Benchmark Comparison (ARM64, Apple M3)

Machine Specs

  • CPU: Apple M3 (8 cores)
  • RAM: 24 GB
  • OS: Darwin 24.6.0 arm64 (macOS Sequoia)
  • Compiler: Apple clang 17.0.0 (clang-1700.6.3.2)
  • Rust: rustc 1.93.1 (01f6ddf75 2026-02-11)
@nmoinvaz
nmoinvaz / top_senders_100.py
Created February 25, 2026 22:42
IMAP top senders analyzer
import argparse
import imaplib
import email.utils
import sys
from collections import Counter
from rich.console import Console
from rich.table import Table
from rich.progress import Progress, BarColumn, TextColumn, TimeRemainingColumn, MofNCompleteColumn
@nmoinvaz
nmoinvaz / zlib-ng-crc32-arm-copy-benchmarks.md
Last active February 24, 2026 18:02
zlib-ng: CRC32 ARM interleaved copy benchmark results (Apple M3)

zlib-ng: CRC32 ARM Interleaved Copy Benchmark Results

Comparison

  • Baseline: develop @ 54352daf (Make extra length/distance bits computation branchless)
  • Contender: improvements/crc32-arm-copy @ b4043c6f (Implement crc32 interleaved copy for ARM PMULL+EOR3)
  • Repetitions: 5 per benchmark, aggregates only

Machine

@nmoinvaz
nmoinvaz / zlib-ng-CLAUDE.md
Last active February 28, 2026 01:17
zlib-ng CLAUDE.md

Project Basics

  • Use CMake build system.
  • Always check the commits for HEAD and BASE or other branch names as they can change often.
  • To build for other architectures than the current architecture use llvm-clang unless gcc is specified.

Key Directories

  • arch/ - Architecture specific optimizations
  • test/ - Unit tests written using Google Test Framework (gtest_zlib project)
@nmoinvaz
nmoinvaz / crc32-arm-copy-benchmarks.md
Last active February 24, 2026 04:49
Zlib-ng benchmark: crc32_armv8_pmull_eor3 — improvements/crc32-arm-copy vs develop

Benchmark: improvements/crc32-arm-copy vs develop

Date: 2026-02-23 Platform: Apple Silicon (ARM64), 8 cores, L1D 64 KiB, L2 4096 KiB Build: CMake Release, static libs Repetitions: 5 (median CPU time reported)

crc32/armv8_pmull_eor3 (CRC32 only)

| Size | develop (ns) | feature (ns) | Change |

@nmoinvaz
nmoinvaz / benchmark_compress_results.md
Created February 21, 2026 00:19
zlib-ng compress benchmark: improvements/tally-v2 vs develop

Compress Benchmark: HEAD (improvements/tally-v2) vs develop

Environment

  • Platform: macOS Darwin 24.6.0, Apple Silicon (ARM64)
  • CPU: 8 cores, L1D 64 KiB, L1I 128 KiB, L2 4096 KiB
  • Build: CMake Release, static libs

Commits

  • HEAD (improvements/tally-v2): c51ce99e — Combine extra_lbits/base_length and extra_dbits/base_dist lookup tables
  • develop: 1b880ba9 — Make extra length/distance bits computation branchless using bit masking
@nmoinvaz
nmoinvaz / compress_block_bi_buf_register_optimization.md
Last active February 19, 2026 03:25
Zlib-ng PR 2167 analysis

Assembly Analysis: Keep bi_buf/bi_valid in Registers Across compress_block

Change

Hoist s->bi_buf and s->bi_valid into local variables in compress_block() and pass them by pointer to the emit functions. This eliminates redundant load/store pairs between zng_emit_lit and zng_emit_dist calls within the main compression loop.

Results

bi_buf/bi_valid Memory Operations (offsets 168/176 from deflate_state*)

@nmoinvaz
nmoinvaz / conditional-preload-pr-2088.md
Created February 19, 2026 02:34
Zlib-ng PR 2088 conditional preload AI analysis

Conditional Preload Optimization Analysis

Comparison of develop (08fa4859) vs HEAD (conditional preload with MIN_HAVE=15).

The patch decodes the next iteration's Huffman symbol before performing the chunk copy, allowing the table lookup latency to overlap with copy operations. A can_preload flag skips the preload when the bit accumulator is low (the UNLIKELY 2+ literal path), keeping INFLATE_FAST_MIN_HAVE at 15 instead of 22.

Benchmark Results

@nmoinvaz
nmoinvaz / variant-matrix-pr-2139.md
Created February 18, 2026 05:42
Zlib-ng variant matrix for PR 2139

Functable Dispatch Matrix — x86 -march Variants

Extracted by inspecting undefined symbols in functable.c.o for each build — these are the function pointers the functable actually assigns at runtime. Builds use clang -target x86_64-apple-macos with runtime CPU detection enabled (the default).

-march native features

-march SSE2 SSSE3 SSE4.1 SSE4.2 PCLMUL AVX2 AVX-512 AVX512VNNI VPCLMUL
x86-64 - - - - - - - - -
nehalem native native native native - - - - -
@nmoinvaz
nmoinvaz / deflate_sym_macros.h
Created February 11, 2026 21:51
Zlib-ng deflate symbol macros
/* ===========================================================================
* Symbol buffer write/read macros.
*
* The symbol buffer stores literal and distance/length pairs. The storage
* format differs based on LIT_MEM (separate buffers) vs sym_buf (interleaved),
* and on whether the platform supports fast unaligned 32-bit access
* (OPTIMAL_CMP >= 32), which allows packing a 3-byte symbol into a single
* 32-bit write/read.
*
* SYM_WRITE_LIT and SYM_WRITE_DIST write a symbol and advance sym_next.