Skip to content

Instantly share code, notes, and snippets.

@adamo1139
adamo1139 / llama-benchy-8x-3090ti-config1.md
Created May 2, 2026 15:34
llama-benchy-8x-3090ti-config1
model test t/s (total) t/s (req) peak t/s peak t/s (req) ttfr (ms) est_ppt (ms) e2e_ttft (ms)
cyankiwi/MiniMax-M2.7-AWQ-4bit pp2048 (c1) 1710.77 ± 23.16 1710.77 ± 23.16 1412.71 ± 16.35 1197.34 ± 16.35 1412.71 ± 16.35
cyankiwi/MiniMax-M2.7-AWQ-4bit tg512 (c1) 17.62 ± 0.20 17.62 ± 0.20 19.67 ± 0.47 19.67 ± 0.47
cyankiwi/MiniMax-M2.7-AWQ-4bit pp2048 (c2) 1610.24 ± 54.20 904.96 ± 43.48 2483.65 ± 108.10 2268.28 ± 108.10 2483.65 ± 108.10
cyankiwi/MiniMax-M2.7-AWQ-4bit
@adamo1139
adamo1139 / gist:6e5eed4112397b88082e06066937f971
Created April 29, 2026 20:47
8x 3090 ti p2p driver on x399 taichi
# p2p test from cuda samples repo
./p2pBandwidthLatencyTest
[P2P (Peer-to-Peer) GPU Bandwidth Latency Test]
Device: 0, NVIDIA GeForce RTX 3090 Ti, pciBusID: 8, pciDeviceID: 0, pciDomainID:0
Device: 1, NVIDIA GeForce RTX 3090 Ti, pciBusID: 9, pciDeviceID: 0, pciDomainID:0
Device: 2, NVIDIA GeForce RTX 3090 Ti, pciBusID: a, pciDeviceID: 0, pciDomainID:0
Device: 3, NVIDIA GeForce RTX 3090 Ti, pciBusID: b, pciDeviceID: 0, pciDomainID:0
Device: 4, NVIDIA GeForce RTX 3090 Ti, pciBusID: 42, pciDeviceID: 0, pciDomainID:0
@adamo1139
adamo1139 / .txt
Created April 20, 2026 15:38
GLM 4.7 EXL3 KLD testing
#ulimit -n 100000
#exllamav3==0.0.28
#flash_attn==2.8.3
#torch==2.8.0+cu128
#ran on rented 4090 48GB modded gpu from Vast.AI
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True python model_diff.py \
-ma /home/ubuntu/workspace/models/glm47-bf16 \
-mb /home/ubuntu/workspace/models/glm47-2bpw_H6
@adamo1139
adamo1139 / gist:2065ada54233dcce0cb88cbd2d68191b
Created April 9, 2026 19:55
Poziomka pretraining env setup
# update apt and install basic tools
apt update
apt -y install zip tmux nano
apt -y install build-essential
apt -y install python-is-python3
# install CUDA 12.9
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
@adamo1139
adamo1139 / Inference.py
Created March 31, 2024 13:04
fix for running DeepSeek-VL-7B in fp16 on 24GB GPU fast.
# Copyright (c) 2023-2024 DeepSeek.
#
# Permission is hereby granted, free of charge, to any person obtaining a copy of
# this software and associated documentation files (the "Software"), to deal in
# the Software without restriction, including without limitation the rights to
# use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
# the Software, and to permit persons to whom the Software is furnished to do so,
# subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all