belisarius222 · April 4, 2026 03:55
diff --git a/omega-status.html b/omega-status.html
 <!DOCTYPE html>
 <html lang="en" data-theme="dark">
 <head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Staged-Polymorphic Omega System — Training Report</title>
  <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/@picocss/pico@2/css/pico.min.css">
  <style>
    :root {
      --pico-font-size: 16px;
    }
    body { padding-bottom: 4rem; }
    h1 { margin-bottom: 0.25em; }
    .subtitle { color: var(--pico-muted-color); margin-bottom: 2rem; font-size: 1.1rem; }
    .result-card {
      border-left: 4px solid var(--pico-primary);
      padding: 1rem 1.5rem;
      margin: 1.5rem 0;
      background: var(--pico-card-background-color);
      border-radius: 0 var(--pico-border-radius) var(--pico-border-radius) 0;
    }
    .result-card.success { border-left-color: #22c55e; }
    .result-card.info { border-left-color: var(--pico-primary); }
    .beat { color: #22c55e; font-weight: bold; }
    .miss { color: #f59e0b; }
    table { font-variant-numeric: tabular-nums; }
    .operator-grid { display: grid; grid-template-columns: 1fr; gap: 1.5rem; margin: 1.5rem 0; }
    @media (min-width: 768px) { .operator-grid { grid-template-columns: 1fr 1fr; } }
    .op-card {
      background: var(--pico-card-background-color);
      border-radius: var(--pico-border-radius);
      padding: 1.25rem;
      border: 1px solid var(--pico-muted-border-color);
    }
    .op-card h3 { margin-top: 0; margin-bottom: 0.5rem; }
    .op-num {
      display: inline-block;
      background: var(--pico-primary);
      color: var(--pico-primary-inverse);
      width: 1.6em; height: 1.6em;
      text-align: center; line-height: 1.6em;
      border-radius: 50%; font-weight: bold;
      margin-right: 0.4em; font-size: 0.9em;
    }
    .analogy { font-style: italic; color: var(--pico-muted-color); margin-top: 0.75rem; }
    .chain-diagram {
      background: var(--pico-card-background-color);
      border-radius: var(--pico-border-radius);
      padding: 1.5rem;
      font-family: monospace;
      font-size: 0.9rem;
      line-height: 1.8;
      overflow-x: auto;
      white-space: pre;
      border: 1px solid var(--pico-muted-border-color);
    }
    details summary { cursor: pointer; font-weight: 600; }
    details[open] summary { margin-bottom: 0.75rem; }
    hr { margin: 2.5rem 0; }
    .tag {
      display: inline-block;
      background: var(--pico-primary);
      color: var(--pico-primary-inverse);
      padding: 0.15em 0.5em;
      border-radius: 4px;
      font-size: 0.8rem;
      font-weight: 600;
    }
    .tag.green { background: #22c55e; }
  </style>
 </head>
 <body>
  <main class="container">

    <h1>Staged-Polymorphic Omega System</h1>
    <p class="subtitle">Training Report &mdash; April 3, 2026</p>

    <!-- ==================== STATUS ==================== -->
    <section>
      <h2>Status</h2>

      <div class="result-card success">
        <strong>The learned model now beats the teacher on all 3 seeds with correct runtime semantics.</strong>
      </div>

      <h3>Key Fix: Init Supervision in Joint Stage</h3>
      <p>
        The learned model's init predictions were <strong>2&times; worse</strong> than the teacher's because of a
        train/eval mask mismatch. Adding <code>init_z_loss</code> (0.5 weight) to the joint loss broke the
        chicken-and-egg cycle where the evaluator starved init of gradient.
      </p>

      <h3>Results <small>(32 eval campaigns each)</small></h3>
      <table>
        <thead>
          <tr><th>Seed</th><th>Joint Epochs</th><th>Teacher</th><th>Learned</th><th>Beats Teacher?</th></tr>
        </thead>
        <tbody>
          <tr><td>13</td><td>8</td><td>0.688</td><td><strong>0.639</strong></td><td><span class="beat">YES (&minus;7.1%)</span></td></tr>
          <tr><td>14</td><td>12</td><td>0.686</td><td><strong>0.678</strong></td><td><span class="beat">YES (&minus;1.2%)</span></td></tr>
          <tr><td>15</td><td>8</td><td>0.706</td><td><strong>0.630</strong></td><td><span class="beat">YES (&minus;10.8%)</span></td></tr>
        </tbody>
      </table>

      <h3>Git State</h3>
      <ul>
        <li>On <code>main</code> at <code>f381bc6</code>, pushed to origin</li>
        <li>Clean working tree (no uncommitted changes)</li>
        <li>Default <code>--joint-epochs</code> is still 4 in <code>omega_train.py</code></li>
      </ul>

      <h3>What's Left</h3>
      <ul>
        <li>Bump default joint epochs to 12 (pending confirmation)</li>
        <li>Could also add generate/update z supervision to joint stage for further gains</li>
        <li>The <code>--teacher-eval-in-joint</code> flag exists but hurts &mdash; could remove to clean up</li>
      </ul>
    </section>

    <hr>

    <!-- ==================== OVERVIEW ==================== -->
    <section>
      <h2>What This System Does</h2>

      <h3>The Problem</h3>
      <p>
        Imagine you have a <strong>teacher</strong> &mdash; a hand-coded algorithm that solves math problems
        (specifically, linear regression tasks). The teacher is pretty good. It looks at a problem, picks a
        strategy, solves it, and then learns from the experience to do better on the next problem.
      </p>
      <p>
        We want to build a <strong>student</strong> (a neural network) that watches the teacher work, learns to
        imitate it, and eventually does <em>better</em> than the teacher.
      </p>

      <h3>The Teacher's Job</h3>
      <p>
        The teacher solves problems in <strong>campaigns</strong> &mdash; sequences of 16 related tasks. For each task, it:
      </p>
      <ol>
        <li><strong>Residualize</strong> &mdash; Figures out which dimensions of the problem matter</li>
        <li><strong>Init</strong> &mdash; Makes a quick first guess at the answer</li>
        <li><strong>Update</strong> &mdash; Iteratively refines the answer (slow but reliable)</li>
        <li><strong>Generate</strong> &mdash; Tries to jump directly to a good answer (fast but risky)</li>
        <li><strong>Evaluate</strong> &mdash; Picks the best of init/generate/update</li>
        <li><strong>Promote</strong> &mdash; Decides whether to update its long-term memory</li>
        <li><strong>Reflect</strong> &mdash; Adjusts its own strategy knobs for next time</li>
      </ol>
      <p>
        Across a campaign, the teacher builds up <strong>family memory</strong> (patterns it recognizes) and can
        even <strong>spawn child models</strong> (specialized solvers for problem types it sees repeatedly).
      </p>

      <h3>The Training Process</h3>
      <p>
        We generate thousands of campaigns where the teacher solves problems, recording everything: what it saw,
        what it decided, and how well it did. This is the <strong>corpus</strong> &mdash; about 32,000 solved tasks.
      </p>
      <p>Then we train the neural network in <strong>stages</strong>:</p>
      <ol>
        <li><strong>Stages 1&ndash;5:</strong> Train each operator separately. The network learns to predict the
        teacher's outputs individually.</li>
        <li><strong>Stage 6 (Joint):</strong> Wire everything together. The network runs its own full pipeline &mdash;
        its init feeds into its generate, its evaluator picks from its own candidates. This is where the operators
        learn to work <em>together</em>, not just individually.</li>
      </ol>

      <h3>Why "Beats Teacher" Is Hard</h3>
      <p>
        At evaluation time, the neural network runs autonomously &mdash; no teacher guidance. It makes decisions,
        updates its memory, and those decisions affect future tasks in the campaign.
        <strong>Errors compound:</strong> a bad promote decision corrupts the memory, which leads to a bad
        base_theta for the next task, which leads to worse init/generate/update, and so on.
      </p>
      <p>
        The teacher has perfect rule-based logic. The student has to approximate all of it with learned weights.
        Getting the student to <em>exceed</em> the teacher means the student found strategies the hand-coded
        rules missed.
      </p>

      <div class="result-card info">
        <strong>The Breakthrough:</strong> The student's init predictions were terrible at runtime &mdash; 2&times;
        worse than the teacher's. The evaluator learned "init is bad, avoid it," which starved init of gradient.
        Adding direct init supervision broke the cycle &mdash; init improved, the evaluator started selecting it,
        and the whole pipeline got better. Result: <strong>1&ndash;11% improvement over the teacher</strong> across
        all seeds.
      </div>
    </section>

    <hr>

    <!-- ==================== SIX OPERATORS ==================== -->
    <section>
      <h2>The Six Operators in Detail</h2>

      <p>
        The system solves <strong>linear regression tasks</strong>: given input-output pairs (support data), find a
        parameter vector <code>theta</code> (8 dimensions) such that <code>y &asymp; X @ theta</code>. Held-out
        "val" data measures solution quality.
      </p>
      <p>
        Each task belongs to a <strong>family</strong> (like "in_basis_easy", "off_basis", "mixed") describing how
        the true answer relates to a shared low-dimensional subspace (the "adapter basis" &mdash; 6 basis vectors
        in 8-d space).
      </p>

      <div class="operator-grid">

        <div class="op-card">
          <h3><span class="op-num">1</span> Residualize</h3>
          <p>
            Figures out <strong>which parts of the subspace matter</strong> for this task. The system has 6 basis
            vectors; not all are relevant every time. Outputs a <strong>mask</strong> (which vectors to use) and a
            <strong>rank</strong> (how many to activate).
          </p>
          <p>The output <code>ResidualSpec</code> is used by all downstream operators &mdash; everything else works
          in this reduced subspace.</p>
          <p class="analogy">Like a photographer choosing which lenses to mount before taking a shot.</p>
        </div>

        <div class="op-card">
          <h3><span class="op-num">2</span> Init</h3>
          <p>
            Produces a <strong>quick first guess</strong> at the solution in a single forward pass. Given the task
            embedding, memory state, base_theta, and the mask, it predicts a <code>z</code> vector &mdash;
            coordinates in the adapter subspace. Final theta = <code>base_theta + basis @ z</code>.
          </p>
          <p>Cheap but limited &mdash; only as good as a learned function of the inputs allows.</p>
          <p class="analogy">Like pattern matching: "given what this problem looks like, here's roughly the answer."</p>
        </div>

        <div class="op-card">
          <h3><span class="op-num">3</span> Update</h3>
          <p>
            <strong>Iteratively refines</strong> init's guess using gradient descent on support data. Starting from
            init's z, it runs 2&ndash;4 learned gradient steps. Each step computes the gradient of support loss
            w.r.t. z and applies a <em>learned</em> update rule (not raw gradient descent).
          </p>
          <p>Slower than init (multiple steps) but more reliable &mdash; directly optimizes on the data.</p>
          <p class="analogy">Like practice: actually working through the problem step by step.</p>
        </div>

        <div class="op-card">
          <h3><span class="op-num">4</span> Generate</h3>
          <p>
            Tries to <strong>jump directly to a good solution</strong> in one shot, taking init's z as input and
            predicting a better z. It has learned shortcuts from many (task, init_z, optimal_z) examples during
            training.
          </p>
          <p>High-risk, high-reward: great answers with almost no compute when it works, worse than init when it
          fails. That's why the evaluator exists.</p>
          <p class="analogy">Like intuition: "when the first guess looks like this, the real answer is usually over there."</p>
        </div>

        <div class="op-card">
          <h3><span class="op-num">5</span> Evaluate + Promote</h3>
          <p>
            <strong>Evaluate</strong> picks the best candidate from init, generate, and update using a neural
            network that sees each candidate's losses and parameter norms.
          </p>
          <p>
            <strong>Promote</strong> then decides: how much to blend the winner into the family prototype
            (<code>prototype_alpha</code>), whether to update the global prior (<code>slow_gate</code>), and
            whether to spawn a specialized child model (<code>spawn_gate</code>).
          </p>
          <p>This is how the system builds <strong>long-term memory</strong> across a campaign.</p>
        </div>

        <div class="op-card">
          <h3><span class="op-num">6</span> Reflect</h3>
          <p>
            <strong>Adjusts the system's own hyperparameters</strong> based on recent performance: learning rate,
            number of update steps, promote aggressiveness, generate acceptance threshold, reflection window size,
            and spawn thresholds.
          </p>
          <p>
            This is <strong>meta-learning</strong>: the system tunes itself over a campaign. Early on it may be
            exploratory; later, as it accumulates knowledge, it becomes more conservative.
          </p>
        </div>

      </div>

      <h3>How They Chain Together</h3>
      <div class="chain-diagram">residualize &rarr; init &rarr; [update, generate] &rarr; evaluate &rarr; promote &rarr; reflect
                                                  &darr;
                                         updates memory &amp; policy
                                                  &darr;
                                         next task uses updated state</div>
      <p style="margin-top: 1rem;">
        <strong>Decisions compound.</strong> A good promote decision improves base_theta for the next task. A good
        reflect decision tunes the policy so update takes the right number of steps. A bad decision in any operator
        cascades forward through the entire campaign.
      </p>
    </section>

  </main>
 </body>
 </html>
	<!DOCTYPE html>
	<html lang="en" data-theme="dark">
	<head>
	<meta charset="UTF-8">
	<meta name="viewport" content="width=device-width, initial-scale=1.0">
	<title>Staged-Polymorphic Omega System — Training Report</title>
	<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/@picocss/pico@2/css/pico.min.css">
	<style>
	:root {
	--pico-font-size: 16px;
	}
	body { padding-bottom: 4rem; }
	h1 { margin-bottom: 0.25em; }
	.subtitle { color: var(--pico-muted-color); margin-bottom: 2rem; font-size: 1.1rem; }
	.result-card {
	border-left: 4px solid var(--pico-primary);
	padding: 1rem 1.5rem;
	margin: 1.5rem 0;
	background: var(--pico-card-background-color);
	border-radius: 0 var(--pico-border-radius) var(--pico-border-radius) 0;
	}
	.result-card.success { border-left-color: #22c55e; }
	.result-card.info { border-left-color: var(--pico-primary); }
	.beat { color: #22c55e; font-weight: bold; }
	.miss { color: #f59e0b; }
	table { font-variant-numeric: tabular-nums; }
	.operator-grid { display: grid; grid-template-columns: 1fr; gap: 1.5rem; margin: 1.5rem 0; }
	@media (min-width: 768px) { .operator-grid { grid-template-columns: 1fr 1fr; } }
	.op-card {
	background: var(--pico-card-background-color);
	border-radius: var(--pico-border-radius);
	padding: 1.25rem;
	border: 1px solid var(--pico-muted-border-color);
	}
	.op-card h3 { margin-top: 0; margin-bottom: 0.5rem; }
	.op-num {
	display: inline-block;
	background: var(--pico-primary);
	color: var(--pico-primary-inverse);
	width: 1.6em; height: 1.6em;
	text-align: center; line-height: 1.6em;
	border-radius: 50%; font-weight: bold;
	margin-right: 0.4em; font-size: 0.9em;
	}
	.analogy { font-style: italic; color: var(--pico-muted-color); margin-top: 0.75rem; }
	.chain-diagram {
	background: var(--pico-card-background-color);
	border-radius: var(--pico-border-radius);
	padding: 1.5rem;
	font-family: monospace;
	font-size: 0.9rem;
	line-height: 1.8;
	overflow-x: auto;
	white-space: pre;
	border: 1px solid var(--pico-muted-border-color);
	}
	details summary { cursor: pointer; font-weight: 600; }
	details[open] summary { margin-bottom: 0.75rem; }
	hr { margin: 2.5rem 0; }
	.tag {
	display: inline-block;
	background: var(--pico-primary);
	color: var(--pico-primary-inverse);
	padding: 0.15em 0.5em;
	border-radius: 4px;
	font-size: 0.8rem;
	font-weight: 600;
	}
	.tag.green { background: #22c55e; }
	</style>
	</head>
	<body>
	<main class="container">

	<h1>Staged-Polymorphic Omega System</h1>
	<p class="subtitle">Training Report — April 3, 2026</p>

	<!-- ==================== STATUS ==================== -->
	<section>
	<h2>Status</h2>

	<div class="result-card success">
	<strong>The learned model now beats the teacher on all 3 seeds with correct runtime semantics.</strong>
	</div>

	<h3>Key Fix: Init Supervision in Joint Stage</h3>
	<p>
	The learned model's init predictions were <strong>2× worse</strong> than the teacher's because of a
	train/eval mask mismatch. Adding <code>init_z_loss</code> (0.5 weight) to the joint loss broke the
	chicken-and-egg cycle where the evaluator starved init of gradient.
	</p>

	<h3>Results <small>(32 eval campaigns each)</small></h3>
	<table>
	<thead>
	<tr><th>Seed</th><th>Joint Epochs</th><th>Teacher</th><th>Learned</th><th>Beats Teacher?</th></tr>
	</thead>
	<tbody>
	<tr><td>13</td><td>8</td><td>0.688</td><td><strong>0.639</strong></td><td><span class="beat">YES (−7.1%)</span></td></tr>
	<tr><td>14</td><td>12</td><td>0.686</td><td><strong>0.678</strong></td><td><span class="beat">YES (−1.2%)</span></td></tr>
	<tr><td>15</td><td>8</td><td>0.706</td><td><strong>0.630</strong></td><td><span class="beat">YES (−10.8%)</span></td></tr>
	</tbody>
	</table>

	<h3>Git State</h3>
	<ul>
	<li>On <code>main</code> at <code>f381bc6</code>, pushed to origin</li>
	<li>Clean working tree (no uncommitted changes)</li>
	<li>Default <code>--joint-epochs</code> is still 4 in <code>omega_train.py</code></li>
	</ul>

	<h3>What's Left</h3>
	<ul>
	<li>Bump default joint epochs to 12 (pending confirmation)</li>
	<li>Could also add generate/update z supervision to joint stage for further gains</li>
	<li>The <code>--teacher-eval-in-joint</code> flag exists but hurts — could remove to clean up</li>
	</ul>
	</section>

	<hr>

	<!-- ==================== OVERVIEW ==================== -->
	<section>
	<h2>What This System Does</h2>

	<h3>The Problem</h3>
	<p>
	Imagine you have a <strong>teacher</strong> — a hand-coded algorithm that solves math problems
	(specifically, linear regression tasks). The teacher is pretty good. It looks at a problem, picks a
	strategy, solves it, and then learns from the experience to do better on the next problem.
	</p>
	<p>
	We want to build a <strong>student</strong> (a neural network) that watches the teacher work, learns to
	imitate it, and eventually does <em>better</em> than the teacher.
	</p>

	<h3>The Teacher's Job</h3>
	<p>
	The teacher solves problems in <strong>campaigns</strong> — sequences of 16 related tasks. For each task, it:
	</p>
	<ol>
	<li><strong>Residualize</strong> — Figures out which dimensions of the problem matter</li>
	<li><strong>Init</strong> — Makes a quick first guess at the answer</li>
	<li><strong>Update</strong> — Iteratively refines the answer (slow but reliable)</li>
	<li><strong>Generate</strong> — Tries to jump directly to a good answer (fast but risky)</li>
	<li><strong>Evaluate</strong> — Picks the best of init/generate/update</li>
	<li><strong>Promote</strong> — Decides whether to update its long-term memory</li>
	<li><strong>Reflect</strong> — Adjusts its own strategy knobs for next time</li>
	</ol>
	<p>
	Across a campaign, the teacher builds up <strong>family memory</strong> (patterns it recognizes) and can
	even <strong>spawn child models</strong> (specialized solvers for problem types it sees repeatedly).
	</p>

	<h3>The Training Process</h3>
	<p>
	We generate thousands of campaigns where the teacher solves problems, recording everything: what it saw,
	what it decided, and how well it did. This is the <strong>corpus</strong> — about 32,000 solved tasks.
	</p>
	<p>Then we train the neural network in <strong>stages</strong>:</p>
	<ol>
	<li><strong>Stages 1–5:</strong> Train each operator separately. The network learns to predict the
	teacher's outputs individually.</li>
	<li><strong>Stage 6 (Joint):</strong> Wire everything together. The network runs its own full pipeline —
	its init feeds into its generate, its evaluator picks from its own candidates. This is where the operators
	learn to work <em>together</em>, not just individually.</li>
	</ol>

	<h3>Why "Beats Teacher" Is Hard</h3>
	<p>
	At evaluation time, the neural network runs autonomously — no teacher guidance. It makes decisions,
	updates its memory, and those decisions affect future tasks in the campaign.
	<strong>Errors compound:</strong> a bad promote decision corrupts the memory, which leads to a bad
	base_theta for the next task, which leads to worse init/generate/update, and so on.
	</p>
	<p>
	The teacher has perfect rule-based logic. The student has to approximate all of it with learned weights.
	Getting the student to <em>exceed</em> the teacher means the student found strategies the hand-coded
	rules missed.
	</p>

	<div class="result-card info">
	<strong>The Breakthrough:</strong> The student's init predictions were terrible at runtime — 2×
	worse than the teacher's. The evaluator learned "init is bad, avoid it," which starved init of gradient.
	Adding direct init supervision broke the cycle — init improved, the evaluator started selecting it,
	and the whole pipeline got better. Result: <strong>1–11% improvement over the teacher</strong> across
	all seeds.
	</div>
	</section>

	<hr>

	<!-- ==================== SIX OPERATORS ==================== -->
	<section>
	<h2>The Six Operators in Detail</h2>

	<p>
	The system solves <strong>linear regression tasks</strong>: given input-output pairs (support data), find a
	parameter vector <code>theta</code> (8 dimensions) such that <code>y ≈ X @ theta</code>. Held-out
	"val" data measures solution quality.
	</p>
	<p>
	Each task belongs to a <strong>family</strong> (like "in_basis_easy", "off_basis", "mixed") describing how
	the true answer relates to a shared low-dimensional subspace (the "adapter basis" — 6 basis vectors
	in 8-d space).
	</p>

	<div class="operator-grid">

	<div class="op-card">
	<h3><span class="op-num">1</span> Residualize</h3>
	<p>
	Figures out <strong>which parts of the subspace matter</strong> for this task. The system has 6 basis
	vectors; not all are relevant every time. Outputs a <strong>mask</strong> (which vectors to use) and a
	<strong>rank</strong> (how many to activate).
	</p>
	<p>The output <code>ResidualSpec</code> is used by all downstream operators — everything else works
	in this reduced subspace.</p>
	<p class="analogy">Like a photographer choosing which lenses to mount before taking a shot.</p>
	</div>

	<div class="op-card">
	<h3><span class="op-num">2</span> Init</h3>
	<p>
	Produces a <strong>quick first guess</strong> at the solution in a single forward pass. Given the task
	embedding, memory state, base_theta, and the mask, it predicts a <code>z</code> vector —
	coordinates in the adapter subspace. Final theta = <code>base_theta + basis @ z</code>.
	</p>
	<p>Cheap but limited — only as good as a learned function of the inputs allows.</p>
	<p class="analogy">Like pattern matching: "given what this problem looks like, here's roughly the answer."</p>
	</div>

	<div class="op-card">
	<h3><span class="op-num">3</span> Update</h3>
	<p>
	<strong>Iteratively refines</strong> init's guess using gradient descent on support data. Starting from
	init's z, it runs 2–4 learned gradient steps. Each step computes the gradient of support loss
	w.r.t. z and applies a <em>learned</em> update rule (not raw gradient descent).
	</p>
	<p>Slower than init (multiple steps) but more reliable — directly optimizes on the data.</p>
	<p class="analogy">Like practice: actually working through the problem step by step.</p>
	</div>

	<div class="op-card">
	<h3><span class="op-num">4</span> Generate</h3>
	<p>
	Tries to <strong>jump directly to a good solution</strong> in one shot, taking init's z as input and
	predicting a better z. It has learned shortcuts from many (task, init_z, optimal_z) examples during
	training.
	</p>
	<p>High-risk, high-reward: great answers with almost no compute when it works, worse than init when it
	fails. That's why the evaluator exists.</p>
	<p class="analogy">Like intuition: "when the first guess looks like this, the real answer is usually over there."</p>
	</div>

	<div class="op-card">
	<h3><span class="op-num">5</span> Evaluate + Promote</h3>
	<p>
	<strong>Evaluate</strong> picks the best candidate from init, generate, and update using a neural
	network that sees each candidate's losses and parameter norms.
	</p>
	<p>
	<strong>Promote</strong> then decides: how much to blend the winner into the family prototype
	(<code>prototype_alpha</code>), whether to update the global prior (<code>slow_gate</code>), and
	whether to spawn a specialized child model (<code>spawn_gate</code>).
	</p>
	<p>This is how the system builds <strong>long-term memory</strong> across a campaign.</p>
	</div>

	<div class="op-card">
	<h3><span class="op-num">6</span> Reflect</h3>
	<p>
	<strong>Adjusts the system's own hyperparameters</strong> based on recent performance: learning rate,
	number of update steps, promote aggressiveness, generate acceptance threshold, reflection window size,
	and spawn thresholds.
	</p>
	<p>
	This is <strong>meta-learning</strong>: the system tunes itself over a campaign. Early on it may be
	exploratory; later, as it accumulates knowledge, it becomes more conservative.
	</p>
	</div>

	</div>

	<h3>How They Chain Together</h3>
	<div class="chain-diagram">residualize → init → [update, generate] → evaluate → promote → reflect
	↓
	updates memory & policy
	↓
	next task uses updated state</div>
	<p style="margin-top: 1rem;">
	<strong>Decisions compound.</strong> A good promote decision improves base_theta for the next task. A good
	reflect decision tunes the policy so update takes the right number of steps. A bad decision in any operator
	cascades forward through the entire campaign.
	</p>
	</section>

	</main>
	</body>
	</html>
No results found