Skip to content

Instantly share code, notes, and snippets.

@andrewnc
Created March 8, 2026 22:06
Show Gist options
  • Select an option

  • Save andrewnc/46cb8445e22abe3700149afe8481a6cd to your computer and use it in GitHub Desktop.

Select an option

Save andrewnc/46cb8445e22abe3700149afe8481a6cd to your computer and use it in GitHub Desktop.
Nano GPT BQN
# nanoGPT-style single-file character model in pure BQN
# trained on Tiny Shakespeare from Karpathy's nanoGPT/char-rnn data path
# result shape:
# ⟨ initial_loss, final_loss, prompt, target, direct_prediction, autoregressive_sample ⟩
t32
c16
h32
steps4000
lr0.03
sampleLen64
corpusPath"CBQN/data/shakespeare_char/input.txt"
MatMul+˝×1
ColSum+˝
RowSum+´1
RowMax´1
OneHot ← {n𝕨ids𝕩((n)=ids)}
Hash ← {1 | 43758.5453 × •math.Sin (12.9898 × 𝕩) + 78.233}
Rand ← {shape𝕩n×´shapeshape Hash 1+↕n}
Init ← {scale𝕨shape𝕩scale × (2×Rand shape)-1}
SoftmaxRows ← {z𝕩e z -˘ RowMax ze ÷˘ RowSum e}
Clip1 ← {(-1.0)1.0𝕩}
corpus•FChars corpusPath
vocab ← (0=⊒corpus)/corpus
tokensvocabcorpus
vvocab
trainStarts ← (tokens)-t
te0.08 Init vc
pe0.08 Init tc
wq0.08 Init cc
wk0.08 Init cc
wv0.08 Init cc
wo0.08 Init cc
w10.08 Init ch
b1h0.0
w20.08 Init hc
b2c0.0
wout0.08 Init cv
boutv0.0
mask ← (t)t
attScalec
Forward ← {
idx𝕩
x0 ← (idxte) + pe
qx0 MatMul wq
kx0 MatMul wk
valx0 MatMul wv
s ← ((q MatMul k) ÷ attScale) + (1-mask) × -1e9
aSoftmaxRows s
ctxa MatMul val
octx MatMul wo
x1x0 + o
h1 ← (x1 MatMul w1) (+1) b1
g•math.Tanh h1
m ← (g MatMul w2) (+1) b2
yx1 + m
logits ← (y MatMul wout) (+1) bout
x0,q,k,val,s,a,ctx,o,x1,h1,g,m,y,logits
}
TrainStep ← {
step𝕩
start•rand.Range trainStarts
idx ← (start+↕t)tokens
tgt ← (start+1+↕t)tokens
idxOhv OneHot idx
tgtOhv OneHot tgt
fForward idx
x00f
q1f
k2f
val3f
a5f
ctx6f
x18f
g10f
y12f
logits13f
probsSoftmaxRows logits
pickedRowSum probs × tgtOh
loss ← (-+´picked) ÷ t
dlogits ← (probs - tgtOh) ÷ t
dwout ← (y) MatMul dlogits
dboutColSum dlogits
dydlogits MatMul wout
dx1dy
dmdy
dw2 ← (g) MatMul dm
db2ColSum dm
dgdm MatMul w2
dh1dg × (1 - g×g)
dw1 ← (x1) MatMul dh1
db1ColSum dh1
dx1 +dh1 MatMul w1
dodx1
dx0dx1
dwo ← (ctx) MatMul do
dctxdo MatMul wo
dadctx MatMul val
dval ← (a) MatMul dctx
rowDotRowSum da × a
dsmask × (a × (da -˘ rowDot))
dq ← (ds MatMul k) ÷ attScale
dk ← ((ds) MatMul q) ÷ attScale
dwq ← (x0) MatMul dq
dwk ← (x0) MatMul dk
dwv ← (x0) MatMul dval
dx0 +↩ (dq MatMul wq) + (dk MatMul wk) + (dval MatMul wv)
dte ← (idxOh) MatMul dx0
dpedx0
te -lr × (Clip1 dte)
pe -lr × (Clip1 dpe)
wq -lr × (Clip1 dwq)
wk -lr × (Clip1 dwk)
wv -lr × (Clip1 dwv)
wo -lr × (Clip1 dwo)
w1 -lr × (Clip1 dw1)
b1 -lr × (Clip1 db1)
w2 -lr × (Clip1 dw2)
b2 -lr × (Clip1 db2)
wout -lr × (Clip1 dwout)
bout -lr × (Clip1 dbout)
loss
}
lossesTrainStep¨ steps
seedStart•rand.Range trainStarts
seed ← (seedStart+↕t)tokens
tgt ← (seedStart+1+↕t)tokens
ffForward seed
logits13ff
pred ← (1) logits
outseed
ctxseed
GenStep ← {𝕩
ggForward ctx
next⊑⍒ ¯1 13gg
outout <next
ctx ↩ (-t) out
next
}
GenStep¨ sampleLen
0losses, ¯1losses, seedvocab, tgtvocab, predvocab, outvocab
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment