Skip to content

Instantly share code, notes, and snippets.

View LunNova's full-sized avatar
❄️
flake.lock

Luna LunNova

❄️
flake.lock
View GitHub Profile
no-screen boot has USB C power and has no display
screen boot is unplugged and works
--- dmesg-no-screen.log 2025-03-27 08:54:36.483875142 -0700
+++ dmesg-screen.log 2025-03-27 08:54:54.849459200 -0700
@@ -158,8 +158,8 @@
ITS@0x0000000017040000: Devices Table too large, reduce ids 32->19
ITS@0x0000000017040000: Devices too large, reduce ITS pages 1024->256
ITS@0x0000000017040000: allocated 131072 Devices @880300000 (indirect, esz 8, psz 4K, shr 1)
-ITS@0x0000000017040000: allocated 4096 Interrupt Collections @88026a000 (flat, esz 1, psz 4K, shr 1)
-GICv3: using LPI property table @0x0000000880290000
ck::BlockwiseGemmXdlops_pipeline_v3<ck::BlockGemmPipelineScheduler::Intrawave, 256, unsigned short, unsigned short, unsigned short, float, ck::TensorDescriptor<ck::Tuple<ck::UnMerge<ck::Tuple<ck::integral_constant<int, 2>, ck::integral_constant<int, 1>, ck::integral_constant<int, 28>, ck::integral_constant<int, 32>, ck::integral_constant<int, 1>, ck::integral_constant<int, 8>>, false>, ck::PassThrough<ck::integral_constant<int, 2>>, ck::PassThrough<ck::integral_constant<int, 1>>, ck::Xor<ck::Tuple<ck::integral_constant<int, 28>, ck::integral_constant<int, 32>>, true>, ck::PassThrough<ck::integral_constant<int, 1>>, ck::PassThrough<ck::integral_constant<int, 8>>, ck::PassThrough<ck::integral_constant<int, 2>>, ck::PassThrough<ck::integral_constant<int, 1>>, ck::UnMerge<ck::Tuple<ck::integral_constant<int, 4>, ck::integral_constant<int, 7>>, false>, ck::UnMerge<ck::Tuple<ck::integral_constant<int, 1>, ck::integral_constant<int, 32>>, false>, ck::PassThrough<ck::integral_constant<int, 1>>, ck::PassThrough<ck::in
@LunNova
LunNova / llvm.diff
Last active January 21, 2025 04:16
git diff ab3a7d91d01d4f4b07b5ea449794e106864fd043..e553730a8ea7cce6fc9df01fa78bfe8a62c28cbc pkgs/development/compilers/llvm/common
/llvm
Diff of llvm dir from working to broken commit when rebasing rocm PR
diff --git a/pkgs/development/compilers/llvm/common/bolt/default.nix b/pkgs/development/compilers/llvm/common/bolt/default.nix
index 1662aa52dde2..9bc078df0e44 100644
--- a/pkgs/development/compilers/llvm/common/bolt/default.nix
+++ b/pkgs/development/compilers/llvm/common/bolt/default.nix
@@ -8,6 +8,7 @@
cmake,
libxml2,
diff --git a/src/include/bootstrap.h b/src/include/bootstrap.h
index 8c5f081..9922b79 100644
--- a/src/include/bootstrap.h
+++ b/src/include/bootstrap.h
@@ -10,11 +10,13 @@
#include "nccl.h"
#include "comm.h"
+// this is accessed through unaligned ptrs because ncclUniqueId is a typedef of char[128]
struct ncclBootstrapHandle {
I1215 08:32:37.869000 4070434 torch/_inductor/config.py:635] compile_threads set to 12 via env
using device: cuda:2
using device: cuda:1
using device: cuda:3
using device: cuda:5
using device: cuda:4
using device: cuda:0
tsukiakari-nixos:4070434:4070434 [0] NCCL INFO Bootstrap : Using eno1np0:10.5.5.236<0>
tsukiakari-nixos:4070434:4070434 [0] NCCL INFO NET/Plugin: No plugin found (librccl-net.so)
tsukiakari-nixos:4070434:4070434 [0] NCCL INFO NET/Plugin: Plugin load returned 2 : librccl-net.so: cannot open shared object file: No such file or directory : when loading librccl-net.so
cmake flags: -DCMAKE_FIND_USE_SYSTEM_PACKAGE_REGISTRY=OFF -DCMAKE_FIND_USE_PACKAGE_REGISTRY=OFF -DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON -DCMAKE_BUILD_TYPE=Release -DBUILD_TESTING=OFF -DCMAKE_INSTALL_LOCALEDIR=/nix/store/2pkfgm8zq2y9zjajgp7qqqyv3f0m9nga-hipblaslt-unstable-20241122/share/locale -DCMAKE_INSTALL_LIBEXECDIR=/nix/store/2pkfgm8zq2y9zjajgp7qqqyv3f0m9nga-hipblaslt-unstable-20241122/libexec -DCMAKE_INSTALL_LIBDIR=/nix/store/2pkfgm8zq2y9zjajgp7qqqyv3f0m9nga-hipblaslt-unstable-20241122/lib -DCMAKE_INSTALL_DOCDIR=/nix/store/2pkfgm8zq2y9zjajgp7qqqyv3f0m9nga-hipblaslt-unstable-20241122/share/doc/hipblaslt -DCMAKE_INSTALL_INFODIR=/nix/store/2pkfgm8zq2y9zjajgp7qqqyv3f0m9nga-hipblaslt-unstable-20241122/share/info -DCMAKE_INSTALL_MANDIR=/nix/store/2pkfgm8zq2y9zjajgp7qqqyv3f0m9nga-hipblaslt-unstable-20241122/share/man -DCMAKE_INSTALL_OLDINCLUDEDIR=/nix/store/2pkfgm8zq2y9zjajgp7qqqyv3f0m9nga-hipblaslt-unstable-20241122/include -DCMAKE_INSTALL_INCLUDEDIR=/nix/store/2pkfgm8zq2y9zjajgp7qqqyv3f0m9nga-hipblaslt-unstable
@LunNova
LunNova / bert-tiny-amd.md
Created October 10, 2024 16:47 — forked from fxkamd/bert-tiny-amd.md
Solutions to problems with BERT training with tinygrad on AMD GPUs

Thank you to tiny corp for pointing out some problems running BERT training with Tinygrad on AMD GPUs in this Tweet. We had a few engineers at AMD take a look at the problem and they were quickly able to reproduce it.

What they found was an issue related to CWSR (compute wave save restore), which is a mechanism that allows our driver and firmware to preempt and reschedule long-running compute waves on our GPUs. The GFXv11 GPU line requires a workaround to set COMPUTE_PGM_RSRC1.PRIV=1 when dispatching a compute kernel. Normally this is handled by the AQL DISPATCH packet. However, since the Tinygrad implementation leverages a custom runtime, it requires this workaround in its PM4-based dispatch. This patch is specific to GFXv11 GPUs. Other GPUs do not require it and should not use this workaround. The following KFDTest patch can be used as a reference: https://github.com/ROCm/ROCT-Thunk-Interface/commit/507637ed5b82197eecbf483cdc1234939766549a

While inv

/// Guards a scope against unwinding, calling a handler if unwinding occurs.
///
/// - Handler gets no panic info; limited to `Fn()`
/// - Not reentrant; handler panics may cause program abort
/// - Intended for single scope; don't store or share
/// - Ineffective with panic=abort
pub struct UnwindDetector<T: Fn()> {
handler: T,
}
function fp_to_bytes(fp, bytes, is_double)
local val = tonumber(fp)
-- it's a NaN or inf
if val ~= val or val == math.huge or val == -math.huge then
bytes[1] = (val ~= val or val == math.huge) and 0x7f or 0xff
bytes[2] = (val ~= val and 0xf9 or 0xf8)
local max = is_double and 8 or 4
for i = 3, max do
bytes[i] = 0
Run in MappingTest subdirectory of https://github.com/MinimallyCorrect/Mapping/tree/transform-only-runs-for-default-attr
Must run publishToMavenLocal in parent directory first.
Note how transform only runs in test case -PTEST=3 where the default value of the attr can be transformed to the requested value.
Even case 4 fails which is where the default value is set as in case 3, but a value the same as that default value is also set in the gradle module metadata.
$ for i in 0 1 2 3 4; do echo; echo "Testing with -PTEST=$i"; echo; ./gradlew.bat -PTEST=$i --no-build-cache build; done
Testing with -PTEST=0