Created
April 4, 2026 03:55
-
-
Save belisarius222/14eef91a586f8a00acc787efaafdc9de to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| <!DOCTYPE html> | |
| <html lang="en" data-theme="dark"> | |
| <head> | |
| <meta charset="UTF-8"> | |
| <meta name="viewport" content="width=device-width, initial-scale=1.0"> | |
| <title>Staged-Polymorphic Omega System — Training Report</title> | |
| <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/@picocss/pico@2/css/pico.min.css"> | |
| <style> | |
| :root { | |
| --pico-font-size: 16px; | |
| } | |
| body { padding-bottom: 4rem; } | |
| h1 { margin-bottom: 0.25em; } | |
| .subtitle { color: var(--pico-muted-color); margin-bottom: 2rem; font-size: 1.1rem; } | |
| .result-card { | |
| border-left: 4px solid var(--pico-primary); | |
| padding: 1rem 1.5rem; | |
| margin: 1.5rem 0; | |
| background: var(--pico-card-background-color); | |
| border-radius: 0 var(--pico-border-radius) var(--pico-border-radius) 0; | |
| } | |
| .result-card.success { border-left-color: #22c55e; } | |
| .result-card.info { border-left-color: var(--pico-primary); } | |
| .beat { color: #22c55e; font-weight: bold; } | |
| .miss { color: #f59e0b; } | |
| table { font-variant-numeric: tabular-nums; } | |
| .operator-grid { display: grid; grid-template-columns: 1fr; gap: 1.5rem; margin: 1.5rem 0; } | |
| @media (min-width: 768px) { .operator-grid { grid-template-columns: 1fr 1fr; } } | |
| .op-card { | |
| background: var(--pico-card-background-color); | |
| border-radius: var(--pico-border-radius); | |
| padding: 1.25rem; | |
| border: 1px solid var(--pico-muted-border-color); | |
| } | |
| .op-card h3 { margin-top: 0; margin-bottom: 0.5rem; } | |
| .op-num { | |
| display: inline-block; | |
| background: var(--pico-primary); | |
| color: var(--pico-primary-inverse); | |
| width: 1.6em; height: 1.6em; | |
| text-align: center; line-height: 1.6em; | |
| border-radius: 50%; font-weight: bold; | |
| margin-right: 0.4em; font-size: 0.9em; | |
| } | |
| .analogy { font-style: italic; color: var(--pico-muted-color); margin-top: 0.75rem; } | |
| .chain-diagram { | |
| background: var(--pico-card-background-color); | |
| border-radius: var(--pico-border-radius); | |
| padding: 1.5rem; | |
| font-family: monospace; | |
| font-size: 0.9rem; | |
| line-height: 1.8; | |
| overflow-x: auto; | |
| white-space: pre; | |
| border: 1px solid var(--pico-muted-border-color); | |
| } | |
| details summary { cursor: pointer; font-weight: 600; } | |
| details[open] summary { margin-bottom: 0.75rem; } | |
| hr { margin: 2.5rem 0; } | |
| .tag { | |
| display: inline-block; | |
| background: var(--pico-primary); | |
| color: var(--pico-primary-inverse); | |
| padding: 0.15em 0.5em; | |
| border-radius: 4px; | |
| font-size: 0.8rem; | |
| font-weight: 600; | |
| } | |
| .tag.green { background: #22c55e; } | |
| </style> | |
| </head> | |
| <body> | |
| <main class="container"> | |
| <h1>Staged-Polymorphic Omega System</h1> | |
| <p class="subtitle">Training Report — April 3, 2026</p> | |
| <!-- ==================== STATUS ==================== --> | |
| <section> | |
| <h2>Status</h2> | |
| <div class="result-card success"> | |
| <strong>The learned model now beats the teacher on all 3 seeds with correct runtime semantics.</strong> | |
| </div> | |
| <h3>Key Fix: Init Supervision in Joint Stage</h3> | |
| <p> | |
| The learned model's init predictions were <strong>2× worse</strong> than the teacher's because of a | |
| train/eval mask mismatch. Adding <code>init_z_loss</code> (0.5 weight) to the joint loss broke the | |
| chicken-and-egg cycle where the evaluator starved init of gradient. | |
| </p> | |
| <h3>Results <small>(32 eval campaigns each)</small></h3> | |
| <table> | |
| <thead> | |
| <tr><th>Seed</th><th>Joint Epochs</th><th>Teacher</th><th>Learned</th><th>Beats Teacher?</th></tr> | |
| </thead> | |
| <tbody> | |
| <tr><td>13</td><td>8</td><td>0.688</td><td><strong>0.639</strong></td><td><span class="beat">YES (−7.1%)</span></td></tr> | |
| <tr><td>14</td><td>12</td><td>0.686</td><td><strong>0.678</strong></td><td><span class="beat">YES (−1.2%)</span></td></tr> | |
| <tr><td>15</td><td>8</td><td>0.706</td><td><strong>0.630</strong></td><td><span class="beat">YES (−10.8%)</span></td></tr> | |
| </tbody> | |
| </table> | |
| <h3>Git State</h3> | |
| <ul> | |
| <li>On <code>main</code> at <code>f381bc6</code>, pushed to origin</li> | |
| <li>Clean working tree (no uncommitted changes)</li> | |
| <li>Default <code>--joint-epochs</code> is still 4 in <code>omega_train.py</code></li> | |
| </ul> | |
| <h3>What's Left</h3> | |
| <ul> | |
| <li>Bump default joint epochs to 12 (pending confirmation)</li> | |
| <li>Could also add generate/update z supervision to joint stage for further gains</li> | |
| <li>The <code>--teacher-eval-in-joint</code> flag exists but hurts — could remove to clean up</li> | |
| </ul> | |
| </section> | |
| <hr> | |
| <!-- ==================== OVERVIEW ==================== --> | |
| <section> | |
| <h2>What This System Does</h2> | |
| <h3>The Problem</h3> | |
| <p> | |
| Imagine you have a <strong>teacher</strong> — a hand-coded algorithm that solves math problems | |
| (specifically, linear regression tasks). The teacher is pretty good. It looks at a problem, picks a | |
| strategy, solves it, and then learns from the experience to do better on the next problem. | |
| </p> | |
| <p> | |
| We want to build a <strong>student</strong> (a neural network) that watches the teacher work, learns to | |
| imitate it, and eventually does <em>better</em> than the teacher. | |
| </p> | |
| <h3>The Teacher's Job</h3> | |
| <p> | |
| The teacher solves problems in <strong>campaigns</strong> — sequences of 16 related tasks. For each task, it: | |
| </p> | |
| <ol> | |
| <li><strong>Residualize</strong> — Figures out which dimensions of the problem matter</li> | |
| <li><strong>Init</strong> — Makes a quick first guess at the answer</li> | |
| <li><strong>Update</strong> — Iteratively refines the answer (slow but reliable)</li> | |
| <li><strong>Generate</strong> — Tries to jump directly to a good answer (fast but risky)</li> | |
| <li><strong>Evaluate</strong> — Picks the best of init/generate/update</li> | |
| <li><strong>Promote</strong> — Decides whether to update its long-term memory</li> | |
| <li><strong>Reflect</strong> — Adjusts its own strategy knobs for next time</li> | |
| </ol> | |
| <p> | |
| Across a campaign, the teacher builds up <strong>family memory</strong> (patterns it recognizes) and can | |
| even <strong>spawn child models</strong> (specialized solvers for problem types it sees repeatedly). | |
| </p> | |
| <h3>The Training Process</h3> | |
| <p> | |
| We generate thousands of campaigns where the teacher solves problems, recording everything: what it saw, | |
| what it decided, and how well it did. This is the <strong>corpus</strong> — about 32,000 solved tasks. | |
| </p> | |
| <p>Then we train the neural network in <strong>stages</strong>:</p> | |
| <ol> | |
| <li><strong>Stages 1–5:</strong> Train each operator separately. The network learns to predict the | |
| teacher's outputs individually.</li> | |
| <li><strong>Stage 6 (Joint):</strong> Wire everything together. The network runs its own full pipeline — | |
| its init feeds into its generate, its evaluator picks from its own candidates. This is where the operators | |
| learn to work <em>together</em>, not just individually.</li> | |
| </ol> | |
| <h3>Why "Beats Teacher" Is Hard</h3> | |
| <p> | |
| At evaluation time, the neural network runs autonomously — no teacher guidance. It makes decisions, | |
| updates its memory, and those decisions affect future tasks in the campaign. | |
| <strong>Errors compound:</strong> a bad promote decision corrupts the memory, which leads to a bad | |
| base_theta for the next task, which leads to worse init/generate/update, and so on. | |
| </p> | |
| <p> | |
| The teacher has perfect rule-based logic. The student has to approximate all of it with learned weights. | |
| Getting the student to <em>exceed</em> the teacher means the student found strategies the hand-coded | |
| rules missed. | |
| </p> | |
| <div class="result-card info"> | |
| <strong>The Breakthrough:</strong> The student's init predictions were terrible at runtime — 2× | |
| worse than the teacher's. The evaluator learned "init is bad, avoid it," which starved init of gradient. | |
| Adding direct init supervision broke the cycle — init improved, the evaluator started selecting it, | |
| and the whole pipeline got better. Result: <strong>1–11% improvement over the teacher</strong> across | |
| all seeds. | |
| </div> | |
| </section> | |
| <hr> | |
| <!-- ==================== SIX OPERATORS ==================== --> | |
| <section> | |
| <h2>The Six Operators in Detail</h2> | |
| <p> | |
| The system solves <strong>linear regression tasks</strong>: given input-output pairs (support data), find a | |
| parameter vector <code>theta</code> (8 dimensions) such that <code>y ≈ X @ theta</code>. Held-out | |
| "val" data measures solution quality. | |
| </p> | |
| <p> | |
| Each task belongs to a <strong>family</strong> (like "in_basis_easy", "off_basis", "mixed") describing how | |
| the true answer relates to a shared low-dimensional subspace (the "adapter basis" — 6 basis vectors | |
| in 8-d space). | |
| </p> | |
| <div class="operator-grid"> | |
| <div class="op-card"> | |
| <h3><span class="op-num">1</span> Residualize</h3> | |
| <p> | |
| Figures out <strong>which parts of the subspace matter</strong> for this task. The system has 6 basis | |
| vectors; not all are relevant every time. Outputs a <strong>mask</strong> (which vectors to use) and a | |
| <strong>rank</strong> (how many to activate). | |
| </p> | |
| <p>The output <code>ResidualSpec</code> is used by all downstream operators — everything else works | |
| in this reduced subspace.</p> | |
| <p class="analogy">Like a photographer choosing which lenses to mount before taking a shot.</p> | |
| </div> | |
| <div class="op-card"> | |
| <h3><span class="op-num">2</span> Init</h3> | |
| <p> | |
| Produces a <strong>quick first guess</strong> at the solution in a single forward pass. Given the task | |
| embedding, memory state, base_theta, and the mask, it predicts a <code>z</code> vector — | |
| coordinates in the adapter subspace. Final theta = <code>base_theta + basis @ z</code>. | |
| </p> | |
| <p>Cheap but limited — only as good as a learned function of the inputs allows.</p> | |
| <p class="analogy">Like pattern matching: "given what this problem looks like, here's roughly the answer."</p> | |
| </div> | |
| <div class="op-card"> | |
| <h3><span class="op-num">3</span> Update</h3> | |
| <p> | |
| <strong>Iteratively refines</strong> init's guess using gradient descent on support data. Starting from | |
| init's z, it runs 2–4 learned gradient steps. Each step computes the gradient of support loss | |
| w.r.t. z and applies a <em>learned</em> update rule (not raw gradient descent). | |
| </p> | |
| <p>Slower than init (multiple steps) but more reliable — directly optimizes on the data.</p> | |
| <p class="analogy">Like practice: actually working through the problem step by step.</p> | |
| </div> | |
| <div class="op-card"> | |
| <h3><span class="op-num">4</span> Generate</h3> | |
| <p> | |
| Tries to <strong>jump directly to a good solution</strong> in one shot, taking init's z as input and | |
| predicting a better z. It has learned shortcuts from many (task, init_z, optimal_z) examples during | |
| training. | |
| </p> | |
| <p>High-risk, high-reward: great answers with almost no compute when it works, worse than init when it | |
| fails. That's why the evaluator exists.</p> | |
| <p class="analogy">Like intuition: "when the first guess looks like this, the real answer is usually over there."</p> | |
| </div> | |
| <div class="op-card"> | |
| <h3><span class="op-num">5</span> Evaluate + Promote</h3> | |
| <p> | |
| <strong>Evaluate</strong> picks the best candidate from init, generate, and update using a neural | |
| network that sees each candidate's losses and parameter norms. | |
| </p> | |
| <p> | |
| <strong>Promote</strong> then decides: how much to blend the winner into the family prototype | |
| (<code>prototype_alpha</code>), whether to update the global prior (<code>slow_gate</code>), and | |
| whether to spawn a specialized child model (<code>spawn_gate</code>). | |
| </p> | |
| <p>This is how the system builds <strong>long-term memory</strong> across a campaign.</p> | |
| </div> | |
| <div class="op-card"> | |
| <h3><span class="op-num">6</span> Reflect</h3> | |
| <p> | |
| <strong>Adjusts the system's own hyperparameters</strong> based on recent performance: learning rate, | |
| number of update steps, promote aggressiveness, generate acceptance threshold, reflection window size, | |
| and spawn thresholds. | |
| </p> | |
| <p> | |
| This is <strong>meta-learning</strong>: the system tunes itself over a campaign. Early on it may be | |
| exploratory; later, as it accumulates knowledge, it becomes more conservative. | |
| </p> | |
| </div> | |
| </div> | |
| <h3>How They Chain Together</h3> | |
| <div class="chain-diagram">residualize → init → [update, generate] → evaluate → promote → reflect | |
| ↓ | |
| updates memory & policy | |
| ↓ | |
| next task uses updated state</div> | |
| <p style="margin-top: 1rem;"> | |
| <strong>Decisions compound.</strong> A good promote decision improves base_theta for the next task. A good | |
| reflect decision tunes the policy so update takes the right number of steps. A bad decision in any operator | |
| cascades forward through the entire campaign. | |
| </p> | |
| </section> | |
| </main> | |
| </body> | |
| </html> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment