Skip to content

Instantly share code, notes, and snippets.

@belisarius222
Last active April 9, 2026 17:45
Show Gist options
  • Select an option

  • Save belisarius222/50822f4909ef1db97e72c8dd98d4b9a5 to your computer and use it in GitHub Desktop.

Select an option

Save belisarius222/50822f4909ef1db97e72c8dd98d4b9a5 to your computer and use it in GitHub Desktop.
IMPOSTER Training Results - Adversarial Text Generation (Apr 2-9, 2026)
IMPOSTER Training Results - Adversarial Text Generation (Apr 2-9, 2026)
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>IMPOSTER Training Results</title>
<style>
:root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --red: #f85149; --blue: #58a6ff; --purple: #bc8cff; --yellow: #d29922; }
* { margin: 0; padding: 0; box-sizing: border-box; }
body { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif; background: var(--bg); color: var(--text); line-height: 1.6; padding: 2rem; max-width: 900px; margin: 0 auto; }
h1 { font-size: 2rem; margin-bottom: 0.5rem; }
h2 { font-size: 1.4rem; margin-top: 2.5rem; margin-bottom: 1rem; color: var(--blue); border-bottom: 1px solid var(--border); padding-bottom: 0.5rem; }
h3 { font-size: 1.1rem; margin-top: 1.5rem; margin-bottom: 0.5rem; color: var(--purple); }
p, li { color: var(--dim); margin-bottom: 0.5rem; }
.subtitle { color: var(--dim); font-size: 1rem; margin-bottom: 2rem; }
.card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1.2rem; margin-bottom: 1rem; }
.metric-row { display: flex; gap: 1rem; flex-wrap: wrap; margin-bottom: 1rem; }
.metric { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; flex: 1; min-width: 140px; text-align: center; }
.metric .value { font-size: 1.8rem; font-weight: 700; color: var(--text); }
.metric .label { font-size: 0.75rem; color: var(--dim); text-transform: uppercase; letter-spacing: 0.05em; }
.metric.good .value { color: var(--green); }
.metric.bad .value { color: var(--red); }
.metric.neutral .value { color: var(--yellow); }
table { width: 100%; border-collapse: collapse; margin: 1rem 0; font-size: 0.9rem; }
th, td { padding: 0.5rem 0.75rem; text-align: left; border-bottom: 1px solid var(--border); }
th { color: var(--dim); font-weight: 600; font-size: 0.8rem; text-transform: uppercase; letter-spacing: 0.05em; }
td { color: var(--text); font-family: 'SF Mono', 'Fira Code', monospace; font-size: 0.85rem; }
.bar-container { display: flex; align-items: center; gap: 0.5rem; }
.bar { height: 16px; border-radius: 3px; background: var(--blue); opacity: 0.8; transition: width 0.3s; }
.bar.green { background: var(--green); }
.bar.red { background: var(--red); }
.bar.yellow { background: var(--yellow); }
.tag { display: inline-block; padding: 0.15rem 0.5rem; border-radius: 4px; font-size: 0.75rem; font-weight: 600; }
.tag.success { background: #1a3a2a; color: var(--green); }
.tag.fail { background: #3a1a1a; color: var(--red); }
.tag.partial { background: #3a2a1a; color: var(--yellow); }
code { background: #1a1f29; padding: 0.15rem 0.4rem; border-radius: 4px; font-size: 0.85rem; }
.insight { border-left: 3px solid var(--yellow); padding-left: 1rem; margin: 1rem 0; color: var(--dim); font-style: italic; }
ul { padding-left: 1.5rem; }
</style>
</head>
<body>
<h1>IMPOSTER Training Results</h1>
<p class="subtitle">Adversarial text generation: Qwen-72B + LoRA vs DeBERTa-v3-large discriminator<br>April 2-9, 2026 &mdash; C45/C47 B200 GPU cluster</p>
<div class="metric-row">
<div class="metric good">
<div class="value">-45%</div>
<div class="label">Gap Reduction (vs original disc.)</div>
</div>
<div class="metric good">
<div class="value">46%</div>
<div class="label">Gen. texts scoring "human"</div>
</div>
<div class="metric good">
<div class="value">9x</div>
<div class="label">Disc. Loss Increase</div>
</div>
<div class="metric">
<div class="value">14,400</div>
<div class="label">Training Steps</div>
</div>
</div>
<h2>Final Evaluation: Trained Generator vs Original Discriminator</h2>
<div class="card">
<p>The training loop co-trains the discriminator alongside the generator, making it a moving target. To measure absolute progress, we evaluated the final generator checkpoint against the <strong>original, frozen discriminator</strong> (the stage-1 DeBERTa-v3-large checkpoint, never updated during training).</p>
<p>We sampled 100 documents from the corpus, removed a sentence from each, and asked both the generator and the original author to fill the gap. The original discriminator then scored both versions.</p>
</div>
<div class="metric-row">
<div class="metric">
<div class="value">+13.0</div>
<div class="label">Human logit (avg)</div>
</div>
<div class="metric neutral">
<div class="value">-0.2</div>
<div class="label">Generator logit (avg)</div>
</div>
<div class="metric good">
<div class="value">13.2</div>
<div class="label">Gap (was 24)</div>
</div>
</div>
<table>
<tr><th>Metric</th><th>Before Training</th><th>After Training</th><th>Change</th></tr>
<tr><td>Gap vs original discriminator</td><td><code>~24</code></td><td><code>13.2</code></td><td><span class="tag success">-45%</span></td></tr>
<tr><td>Generator texts with positive logit</td><td><code>~1%</code></td><td><code>46%</code></td><td><span class="tag success">46x more</span></td></tr>
<tr><td>Generator beats human text</td><td><code>0%</code></td><td><code>4%</code></td><td><span class="tag success">from zero</span></td></tr>
<tr><td>Gap &lt; 10 (near-human)</td><td><code>~6%</code></td><td><code>32%</code></td><td><span class="tag success">5x more</span></td></tr>
<tr><td>Gap &lt; 5 (indistinguishable)</td><td><code>~2%</code></td><td><code>15%</code></td><td><span class="tag success">7x more</span></td></tr>
</table>
<h3>How to read these numbers</h3>
<div class="card">
<p>The discriminator outputs a raw logit for each text span. <strong>Positive logit = "this looks human." Negative = "this looks machine-generated."</strong></p>
<p>Before training, nearly all generator outputs scored deeply negative (avg logit -12). The discriminator had no trouble identifying them. After training:</p>
<ul>
<li><strong>46% of generated texts now score positive</strong> &mdash; the discriminator thinks they might be human-written. Before training this was ~1%.</li>
<li><strong>The average generator logit moved from -12 to -0.2</strong> &mdash; essentially at the decision boundary. The discriminator is genuinely uncertain.</li>
<li><strong>15% of generated texts are within gap &lt; 5 of the human original</strong> &mdash; effectively indistinguishable to this discriminator.</li>
<li><strong>4% of generated texts actually score higher than the human text</strong> &mdash; the generator produced a more "human-sounding" fill than the original author.</li>
</ul>
<p>The gap of 13.2 (down from 24) represents a 45% reduction in the discriminator's ability to separate generated from human text. The remaining gap reflects cases where the generator produces text that is fluent but contextually off &mdash; the discriminator still catches those.</p>
</div>
<h2>What is the Contrastive Gap?</h2>
<div class="card">
<p><strong>gap = human_logit &minus; mean(generated_logits)</strong></p>
<p>The DeBERTa discriminator scores text spans with a raw logit. Higher logit = "looks more human." The gap measures how much better human text scores vs generator output. A gap of 0 means the discriminator can't tell them apart. Negative gap means the generator fools the discriminator.</p>
</div>
<h2>Gap Trajectory</h2>
<p>500-step rolling averages. First merge sync at step 500; co-training at steps 2000, 4000, 6000.</p>
<table>
<tr><th>Steps</th><th>Avg Gap</th><th></th><th>Phase</th></tr>
<tr><td>1-1000</td><td>23.3</td><td><div class="bar-container"><div class="bar red" style="width:93%"></div></div></td><td>Baseline (pre-sync)</td></tr>
<tr><td>1001-2000</td><td>23.0</td><td><div class="bar-container"><div class="bar red" style="width:92%"></div></div></td><td>LoRA sync begins</td></tr>
<tr><td>2001-3000</td><td>20.2</td><td><div class="bar-container"><div class="bar yellow" style="width:81%"></div></div></td><td>First improvement</td></tr>
<tr><td>3001-4000</td><td>17.8</td><td><div class="bar-container"><div class="bar yellow" style="width:71%"></div></div></td><td>Best early gap</td></tr>
<tr><td>4001-5000</td><td>19.6</td><td><div class="bar-container"><div class="bar yellow" style="width:78%"></div></div></td><td>Post co-training bounce</td></tr>
<tr><td>5001-6000</td><td>19.6</td><td><div class="bar-container"><div class="bar yellow" style="width:78%"></div></div></td><td>Plateau</td></tr>
<tr><td>6001-7000</td><td>18.4</td><td><div class="bar-container"><div class="bar yellow" style="width:74%"></div></div></td><td>Second improvement</td></tr>
<tr><td>7001-8000</td><td>19.4</td><td><div class="bar-container"><div class="bar yellow" style="width:78%"></div></div></td><td>Oscillation</td></tr>
<tr><td>8001-9000</td><td>18.3</td><td><div class="bar-container"><div class="bar yellow" style="width:73%"></div></div></td><td>Improving</td></tr>
<tr><td>9001-10000</td><td>17.0</td><td><div class="bar-container"><div class="bar green" style="width:68%"></div></div></td><td>New low</td></tr>
<tr><td>10001-11000</td><td>17.6</td><td><div class="bar-container"><div class="bar green" style="width:70%"></div></div></td><td>Stable</td></tr>
<tr><td>11001-12000</td><td>17.3</td><td><div class="bar-container"><div class="bar green" style="width:69%"></div></div></td><td>Stable</td></tr>
<tr><td>12001-13000</td><td>19.1</td><td><div class="bar-container"><div class="bar yellow" style="width:76%"></div></div></td><td>Co-training bounce</td></tr>
<tr><td>13001-14000</td><td>18.7</td><td><div class="bar-container"><div class="bar yellow" style="width:75%"></div></div></td><td>Recovering</td></tr>
</table>
<h2>Early vs Late Comparison</h2>
<div class="metric-row">
<div class="metric">
<div class="value">23.1</div>
<div class="label">Avg Gap (first 2K steps)</div>
</div>
<div class="metric good">
<div class="value">18.5</div>
<div class="label">Avg Gap (steps 2K-14K)</div>
</div>
</div>
<div class="metric-row">
<div class="metric">
<div class="value">6%</div>
<div class="label">Steps with gap &lt; 10 (early)</div>
</div>
<div class="metric good">
<div class="value">17%</div>
<div class="label">Steps with gap &lt; 10 (late)</div>
</div>
</div>
<div class="metric-row">
<div class="metric">
<div class="value">0</div>
<div class="label">Negative gaps (early)</div>
</div>
<div class="metric good">
<div class="value">19</div>
<div class="label">Negative gaps (late)</div>
</div>
</div>
<p>19 steps where the generator's output scored <em>higher</em> than the original human text. This never happened in the first 2000 steps.</p>
<h2>Discriminator Co-training</h2>
<p>The discriminator head is retrained every 2000 steps on fresh generator output. Rising loss = harder to classify.</p>
<table>
<tr><th>Step</th><th>Co-training Loss</th><th></th><th>Interpretation</th></tr>
<tr><td>2,000</td><td>0.079</td><td><div class="bar-container"><div class="bar green" style="width:16%"></div></div></td><td>Easy to classify</td></tr>
<tr><td>4,000</td><td>0.126</td><td><div class="bar-container"><div class="bar yellow" style="width:25%"></div></div></td><td>Harder</td></tr>
<tr><td>6,000</td><td>0.261</td><td><div class="bar-container"><div class="bar red" style="width:52%"></div></div></td><td>Struggling to distinguish</td></tr>
</table>
<div class="insight">At co-training loss 0.26, the discriminator head is close to chance (0.5 = random guessing). The generator is producing text that the discriminator finds genuinely difficult to classify.</div>
<h2>Reward Signal</h2>
<table>
<tr><th>Metric</th><th>Early (0-2K)</th><th>Late (2K-14K)</th><th>Change</th></tr>
<tr><td>Mean reward</td><td><code>-12.2</code></td><td><code>-8.2</code></td><td><span class="tag success">+33%</span></td></tr>
<tr><td>Positive rewards</td><td><code>1%</code> of steps</td><td><code>7%</code> of steps</td><td><span class="tag success">7x more</span></td></tr>
<tr><td>BC loss</td><td><code>0.046</code></td><td><code>0.032</code></td><td><span class="tag success">-30%</span></td></tr>
<tr><td>KL divergence</td><td><code>0.02</code></td><td><code>0.19</code></td><td><span class="tag partial">LoRA diverging</span></td></tr>
</table>
<h2>The 5-Day Bug</h2>
<div class="card">
<h3>Runs 1-4: Flat Training (Apr 2-7)</h3>
<p>Four full training runs showed zero improvement. The contrastive gap was perfectly flat at ~25 for 40,000+ cumulative steps across all runs. We tried:</p>
<ul>
<li>Raw logits instead of sigmoid compression</li>
<li>Z-score normalized advantages instead of rank-based</li>
<li>Behavior cloning on human text (bc_weight 0.1 and 1.0)</li>
<li>KL anchoring to reference policy</li>
<li>Slower co-training with replay buffer</li>
</ul>
<p>None of it worked.</p>
<h3>Root Cause</h3>
<p>vLLM's <code>/v1/load_lora_adapter</code> API returned HTTP 400. The response body said:</p>
<p><code>"The lora adapter 'imposter-lora' has already been loaded. If you want to load the adapter in place, set 'load_inplace' to True."</code></p>
<p>The fix was adding one field to the JSON payload. But the training code only logged <code>"400 Bad Request"</code> without reading the response body. The generator's weights were updating via backprop, but <strong>the updated weights never reached the vLLM server for generation</strong>. Generated text was always from the base model. The discriminator always saw the same distribution. The gap could not shrink.</p>
</div>
<h2>LoRA Serving: 9 Attempts</h2>
<table>
<tr><th>#</th><th>Approach</th><th>Result</th><th></th></tr>
<tr><td>1</td><td>vLLM LoRA serving</td><td>45x slower</td><td><span class="tag fail">FAIL</span></td></tr>
<tr><td>2</td><td>Merge + save 140GB + cron rsync</td><td>Too slow</td><td><span class="tag fail">FAIL</span></td></tr>
<tr><td>3</td><td>Copy LoRA to C45 + CPU merge</td><td>Tokenizer corruption</td><td><span class="tag fail">FAIL</span></td></tr>
<tr><td>4</td><td>Give up, serve base model</td><td>No training signal</td><td><span class="tag partial">GAVE UP</span></td></tr>
<tr><td>5</td><td>GPU merge script on C45</td><td>0.6s merge, works</td><td><span class="tag success">OK</span></td></tr>
<tr><td>6</td><td>Inline rsync + GPU merge</td><td>Implemented, superseded</td><td><span class="tag success">OK</span></td></tr>
<tr><td>7</td><td>NCCL broadcast (14 hours)</td><td>RDMA version mismatch</td><td><span class="tag fail">FAIL</span></td></tr>
<tr><td>8</td><td>LoRA hot-swap (no merge)</td><td>Silently broken (400)</td><td><span class="tag fail">FAIL</span></td></tr>
<tr><td>9</td><td>Hot-swap + load_inplace</td><td>Works but 70x slow</td><td><span class="tag partial">SLOW</span></td></tr>
</table>
<p><strong>Final solution:</strong> Return to attempt 5. Merge LoRA into base weights on C45's GPUs (0.6s), save merged model (~4 min), restart vLLM without <code>--enable-lora</code>. Full base model speed: 320 tok/s.</p>
<h2>Performance</h2>
<div class="metric-row">
<div class="metric bad">
<div class="value">4</div>
<div class="label">tok/s (LoRA serving)</div>
</div>
<div class="metric good">
<div class="value">320</div>
<div class="label">tok/s (merged model)</div>
</div>
</div>
<div class="metric-row">
<div class="metric">
<div class="value">17s</div>
<div class="label">step time (LoRA)</div>
</div>
<div class="metric good">
<div class="value">4.5s</div>
<div class="label">step time (merged)</div>
</div>
</div>
<p>The LoRA serving slowdown was caused by vLLM 0.18's Punica BGMV kernels: 560 LoRA pairs (7 target modules &times; 80 layers) with TP-8 tensor parallelism. Known issue in vLLM's V1 engine.</p>
<h2>Architecture</h2>
<div class="card">
<p><strong>Training loop (GRPO + BC):</strong></p>
<ul>
<li>Generate 4 candidates via vLLM (merged model, C45)</li>
<li>Score all candidates + human text with DeBERTa discriminator (GPU 7, C47)</li>
<li>Z-score normalized advantages over generated candidates</li>
<li>Policy gradient loss + KL anchor (beta=0.01) on generated candidates</li>
<li>Behavior cloning loss (bc_weight=0.1) on human target text</li>
<li>LoRA merge + vLLM restart every 500 steps</li>
<li>Discriminator co-training every 2000 steps</li>
</ul>
</div>
<h2>Artifacts</h2>
<table>
<tr><th>Artifact</th><th>Location</th></tr>
<tr><td>Final LoRA</td><td><code>gs://volta-489906-artifacts/imposter/checkpoints/generator/lora_final/</code></td></tr>
<tr><td>Step checkpoints</td><td><code>gs://volta-489906-artifacts/imposter/checkpoints/generator/lora_step_{2k,4k,8k,12k,14k}/</code></td></tr>
<tr><td>Discriminator head</td><td><code>gs://volta-489906-artifacts/imposter/checkpoints/discriminator/co_trained_head.pt</code></td></tr>
<tr><td>Training logs</td><td><code>gs://volta-489906-artifacts/imposter/logs/</code></td></tr>
<tr><td>Code</td><td><code>github.com/belisarius222/imposter</code> (if pushed)</td></tr>
</table>
<h2>What's Next</h2>
<ul>
<li><strong>Reduce LoRA target modules</strong> &mdash; Research shows o_proj + gate_proj (2-3 modules) nearly as good as all 7 but 3.5x fewer parameters. Faster merge, faster training.</li>
<li><strong>Reward Interpreter</strong> &mdash; GPT-Pro-designed architecture for richer gradient signal from discriminator hidden states. Sentence-level scoring instead of span-level.</li>
<li><strong>Distributed training</strong> &mdash; Currently single-process with device_map=auto (12.5% GPU utilization). QLoRA+DDP is the identified best option.</li>
<li><strong>Contrastive discriminator</strong> &mdash; Replace sigmoid with pairwise margin scoring for better reward signal.</li>
</ul>
</body>
</html>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment