Skip to content

Instantly share code, notes, and snippets.

View AmineDiro's full-sized avatar
👨‍🍳
Cooking

AmineDiro AmineDiro

👨‍🍳
Cooking
View GitHub Profile
@slimbuck
slimbuck / webgpu_metal_capture.txt
Last active February 19, 2026 11:32
Capturing WebGPU metal trace on MacOS
1) Clone and build WebKit
git clone https://github.com/WebKit/WebKit.git WebKit
cd WebKit
Tools/Scripts/build-webkit -cmakeargs="-DENABLE_WEBGPU_BY_DEFAULT=1" --debug
2) Run your app
__XPC_METAL_CAPTURE_ENABLED=1 Tools/Scripts/run-minibrowser --debug --url http://localhost:5000/index.html#/loaders/gsplat
@mcarilli
mcarilli / nsight.sh
Last active April 25, 2026 01:15
Favorite nsight systems profiling commands for Pytorch scripts
# This isn't supposed to run as a bash script, i named it with ".sh" for syntax highlighting.
# https://developer.nvidia.com/nsight-systems
# https://docs.nvidia.com/nsight-systems/profiling/index.html
# My preferred nsys (command line executable used to create profiles) commands
#
# In your script, write
# torch.cuda.nvtx.range_push("region name")
# ...
@jboner
jboner / latency.txt
Last active June 1, 2026 19:38
Latency Numbers Every Programmer Should Know
Latency Comparison Numbers (~2012)
----------------------------------
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy 3,000 ns 3 us
Send 1K bytes over 1 Gbps network 10,000 ns 10 us
Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD