adamo1139’s gists

adamo1139 / llama-benchy-8x-3090ti-config1.md

Created May 2, 2026 15:34

llama-benchy-8x-3090ti-config1

model	test	t/s (total)	t/s (req)	peak t/s	peak t/s (req)	ttfr (ms)	est_ppt (ms)	e2e_ttft (ms)
cyankiwi/MiniMax-M2.7-AWQ-4bit	pp2048 (c1)	1710.77 ± 23.16	1710.77 ± 23.16			1412.71 ± 16.35	1197.34 ± 16.35	1412.71 ± 16.35
cyankiwi/MiniMax-M2.7-AWQ-4bit	tg512 (c1)	17.62 ± 0.20	17.62 ± 0.20	19.67 ± 0.47	19.67 ± 0.47
cyankiwi/MiniMax-M2.7-AWQ-4bit	pp2048 (c2)	1610.24 ± 54.20	904.96 ± 43.48			2483.65 ± 108.10	2268.28 ± 108.10	2483.65 ± 108.10
cyankiwi/MiniMax-M2.7-AWQ-4bit

adamo1139 / gist:6e5eed4112397b88082e06066937f971

Created April 29, 2026 20:47

8x 3090 ti p2p driver on x399 taichi

	# p2p test from cuda samples repo

	./p2pBandwidthLatencyTest
	[P2P (Peer-to-Peer) GPU Bandwidth Latency Test]
	Device: 0, NVIDIA GeForce RTX 3090 Ti, pciBusID: 8, pciDeviceID: 0, pciDomainID:0
	Device: 1, NVIDIA GeForce RTX 3090 Ti, pciBusID: 9, pciDeviceID: 0, pciDomainID:0
	Device: 2, NVIDIA GeForce RTX 3090 Ti, pciBusID: a, pciDeviceID: 0, pciDomainID:0
	Device: 3, NVIDIA GeForce RTX 3090 Ti, pciBusID: b, pciDeviceID: 0, pciDomainID:0
	Device: 4, NVIDIA GeForce RTX 3090 Ti, pciBusID: 42, pciDeviceID: 0, pciDomainID:0

adamo1139 / .txt

Created April 20, 2026 15:38

GLM 4.7 EXL3 KLD testing

	#ulimit -n 100000
	#exllamav3==0.0.28
	#flash_attn==2.8.3
	#torch==2.8.0+cu128
	#ran on rented 4090 48GB modded gpu from Vast.AI

	PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True python model_diff.py \
	-ma /home/ubuntu/workspace/models/glm47-bf16 \
	-mb /home/ubuntu/workspace/models/glm47-2bpw_H6

adamo1139 / gist:2065ada54233dcce0cb88cbd2d68191b

Created April 9, 2026 19:55

Poziomka pretraining env setup

	# update apt and install basic tools
	apt update
	apt -y install zip tmux nano
	apt -y install build-essential
	apt -y install python-is-python3


	# install CUDA 12.9
	wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
	sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600

adamo1139 / Inference.py

Created March 31, 2024 13:04

fix for running DeepSeek-VL-7B in fp16 on 24GB GPU fast.

	# Copyright (c) 2023-2024 DeepSeek.
	#
	# Permission is hereby granted, free of charge, to any person obtaining a copy of
	# this software and associated documentation files (the "Software"), to deal in
	# the Software without restriction, including without limitation the rights to
	# use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
	# the Software, and to permit persons to whom the Software is furnished to do so,
	# subject to the following conditions:
	#
	# The above copyright notice and this permission notice shall be included in all