Last active
April 9, 2026 17:45
-
-
Save belisarius222/50822f4909ef1db97e72c8dd98d4b9a5 to your computer and use it in GitHub Desktop.
IMPOSTER Training Results - Adversarial Text Generation (Apr 2-9, 2026)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| IMPOSTER Training Results - Adversarial Text Generation (Apr 2-9, 2026) | |
| <!DOCTYPE html> | |
| <html lang="en"> | |
| <head> | |
| <meta charset="UTF-8"> | |
| <meta name="viewport" content="width=device-width, initial-scale=1.0"> | |
| <title>IMPOSTER Training Results</title> | |
| <style> | |
| :root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --red: #f85149; --blue: #58a6ff; --purple: #bc8cff; --yellow: #d29922; } | |
| * { margin: 0; padding: 0; box-sizing: border-box; } | |
| body { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif; background: var(--bg); color: var(--text); line-height: 1.6; padding: 2rem; max-width: 900px; margin: 0 auto; } | |
| h1 { font-size: 2rem; margin-bottom: 0.5rem; } | |
| h2 { font-size: 1.4rem; margin-top: 2.5rem; margin-bottom: 1rem; color: var(--blue); border-bottom: 1px solid var(--border); padding-bottom: 0.5rem; } | |
| h3 { font-size: 1.1rem; margin-top: 1.5rem; margin-bottom: 0.5rem; color: var(--purple); } | |
| p, li { color: var(--dim); margin-bottom: 0.5rem; } | |
| .subtitle { color: var(--dim); font-size: 1rem; margin-bottom: 2rem; } | |
| .card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1.2rem; margin-bottom: 1rem; } | |
| .metric-row { display: flex; gap: 1rem; flex-wrap: wrap; margin-bottom: 1rem; } | |
| .metric { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; flex: 1; min-width: 140px; text-align: center; } | |
| .metric .value { font-size: 1.8rem; font-weight: 700; color: var(--text); } | |
| .metric .label { font-size: 0.75rem; color: var(--dim); text-transform: uppercase; letter-spacing: 0.05em; } | |
| .metric.good .value { color: var(--green); } | |
| .metric.bad .value { color: var(--red); } | |
| .metric.neutral .value { color: var(--yellow); } | |
| table { width: 100%; border-collapse: collapse; margin: 1rem 0; font-size: 0.9rem; } | |
| th, td { padding: 0.5rem 0.75rem; text-align: left; border-bottom: 1px solid var(--border); } | |
| th { color: var(--dim); font-weight: 600; font-size: 0.8rem; text-transform: uppercase; letter-spacing: 0.05em; } | |
| td { color: var(--text); font-family: 'SF Mono', 'Fira Code', monospace; font-size: 0.85rem; } | |
| .bar-container { display: flex; align-items: center; gap: 0.5rem; } | |
| .bar { height: 16px; border-radius: 3px; background: var(--blue); opacity: 0.8; transition: width 0.3s; } | |
| .bar.green { background: var(--green); } | |
| .bar.red { background: var(--red); } | |
| .bar.yellow { background: var(--yellow); } | |
| .tag { display: inline-block; padding: 0.15rem 0.5rem; border-radius: 4px; font-size: 0.75rem; font-weight: 600; } | |
| .tag.success { background: #1a3a2a; color: var(--green); } | |
| .tag.fail { background: #3a1a1a; color: var(--red); } | |
| .tag.partial { background: #3a2a1a; color: var(--yellow); } | |
| code { background: #1a1f29; padding: 0.15rem 0.4rem; border-radius: 4px; font-size: 0.85rem; } | |
| .insight { border-left: 3px solid var(--yellow); padding-left: 1rem; margin: 1rem 0; color: var(--dim); font-style: italic; } | |
| ul { padding-left: 1.5rem; } | |
| </style> | |
| </head> | |
| <body> | |
| <h1>IMPOSTER Training Results</h1> | |
| <p class="subtitle">Adversarial text generation: Qwen-72B + LoRA vs DeBERTa-v3-large discriminator<br>April 2-9, 2026 — C45/C47 B200 GPU cluster</p> | |
| <div class="metric-row"> | |
| <div class="metric good"> | |
| <div class="value">-45%</div> | |
| <div class="label">Gap Reduction (vs original disc.)</div> | |
| </div> | |
| <div class="metric good"> | |
| <div class="value">46%</div> | |
| <div class="label">Gen. texts scoring "human"</div> | |
| </div> | |
| <div class="metric good"> | |
| <div class="value">9x</div> | |
| <div class="label">Disc. Loss Increase</div> | |
| </div> | |
| <div class="metric"> | |
| <div class="value">14,400</div> | |
| <div class="label">Training Steps</div> | |
| </div> | |
| </div> | |
| <h2>Final Evaluation: Trained Generator vs Original Discriminator</h2> | |
| <div class="card"> | |
| <p>The training loop co-trains the discriminator alongside the generator, making it a moving target. To measure absolute progress, we evaluated the final generator checkpoint against the <strong>original, frozen discriminator</strong> (the stage-1 DeBERTa-v3-large checkpoint, never updated during training).</p> | |
| <p>We sampled 100 documents from the corpus, removed a sentence from each, and asked both the generator and the original author to fill the gap. The original discriminator then scored both versions.</p> | |
| </div> | |
| <div class="metric-row"> | |
| <div class="metric"> | |
| <div class="value">+13.0</div> | |
| <div class="label">Human logit (avg)</div> | |
| </div> | |
| <div class="metric neutral"> | |
| <div class="value">-0.2</div> | |
| <div class="label">Generator logit (avg)</div> | |
| </div> | |
| <div class="metric good"> | |
| <div class="value">13.2</div> | |
| <div class="label">Gap (was 24)</div> | |
| </div> | |
| </div> | |
| <table> | |
| <tr><th>Metric</th><th>Before Training</th><th>After Training</th><th>Change</th></tr> | |
| <tr><td>Gap vs original discriminator</td><td><code>~24</code></td><td><code>13.2</code></td><td><span class="tag success">-45%</span></td></tr> | |
| <tr><td>Generator texts with positive logit</td><td><code>~1%</code></td><td><code>46%</code></td><td><span class="tag success">46x more</span></td></tr> | |
| <tr><td>Generator beats human text</td><td><code>0%</code></td><td><code>4%</code></td><td><span class="tag success">from zero</span></td></tr> | |
| <tr><td>Gap < 10 (near-human)</td><td><code>~6%</code></td><td><code>32%</code></td><td><span class="tag success">5x more</span></td></tr> | |
| <tr><td>Gap < 5 (indistinguishable)</td><td><code>~2%</code></td><td><code>15%</code></td><td><span class="tag success">7x more</span></td></tr> | |
| </table> | |
| <h3>How to read these numbers</h3> | |
| <div class="card"> | |
| <p>The discriminator outputs a raw logit for each text span. <strong>Positive logit = "this looks human." Negative = "this looks machine-generated."</strong></p> | |
| <p>Before training, nearly all generator outputs scored deeply negative (avg logit -12). The discriminator had no trouble identifying them. After training:</p> | |
| <ul> | |
| <li><strong>46% of generated texts now score positive</strong> — the discriminator thinks they might be human-written. Before training this was ~1%.</li> | |
| <li><strong>The average generator logit moved from -12 to -0.2</strong> — essentially at the decision boundary. The discriminator is genuinely uncertain.</li> | |
| <li><strong>15% of generated texts are within gap < 5 of the human original</strong> — effectively indistinguishable to this discriminator.</li> | |
| <li><strong>4% of generated texts actually score higher than the human text</strong> — the generator produced a more "human-sounding" fill than the original author.</li> | |
| </ul> | |
| <p>The gap of 13.2 (down from 24) represents a 45% reduction in the discriminator's ability to separate generated from human text. The remaining gap reflects cases where the generator produces text that is fluent but contextually off — the discriminator still catches those.</p> | |
| </div> | |
| <h2>What is the Contrastive Gap?</h2> | |
| <div class="card"> | |
| <p><strong>gap = human_logit − mean(generated_logits)</strong></p> | |
| <p>The DeBERTa discriminator scores text spans with a raw logit. Higher logit = "looks more human." The gap measures how much better human text scores vs generator output. A gap of 0 means the discriminator can't tell them apart. Negative gap means the generator fools the discriminator.</p> | |
| </div> | |
| <h2>Gap Trajectory</h2> | |
| <p>500-step rolling averages. First merge sync at step 500; co-training at steps 2000, 4000, 6000.</p> | |
| <table> | |
| <tr><th>Steps</th><th>Avg Gap</th><th></th><th>Phase</th></tr> | |
| <tr><td>1-1000</td><td>23.3</td><td><div class="bar-container"><div class="bar red" style="width:93%"></div></div></td><td>Baseline (pre-sync)</td></tr> | |
| <tr><td>1001-2000</td><td>23.0</td><td><div class="bar-container"><div class="bar red" style="width:92%"></div></div></td><td>LoRA sync begins</td></tr> | |
| <tr><td>2001-3000</td><td>20.2</td><td><div class="bar-container"><div class="bar yellow" style="width:81%"></div></div></td><td>First improvement</td></tr> | |
| <tr><td>3001-4000</td><td>17.8</td><td><div class="bar-container"><div class="bar yellow" style="width:71%"></div></div></td><td>Best early gap</td></tr> | |
| <tr><td>4001-5000</td><td>19.6</td><td><div class="bar-container"><div class="bar yellow" style="width:78%"></div></div></td><td>Post co-training bounce</td></tr> | |
| <tr><td>5001-6000</td><td>19.6</td><td><div class="bar-container"><div class="bar yellow" style="width:78%"></div></div></td><td>Plateau</td></tr> | |
| <tr><td>6001-7000</td><td>18.4</td><td><div class="bar-container"><div class="bar yellow" style="width:74%"></div></div></td><td>Second improvement</td></tr> | |
| <tr><td>7001-8000</td><td>19.4</td><td><div class="bar-container"><div class="bar yellow" style="width:78%"></div></div></td><td>Oscillation</td></tr> | |
| <tr><td>8001-9000</td><td>18.3</td><td><div class="bar-container"><div class="bar yellow" style="width:73%"></div></div></td><td>Improving</td></tr> | |
| <tr><td>9001-10000</td><td>17.0</td><td><div class="bar-container"><div class="bar green" style="width:68%"></div></div></td><td>New low</td></tr> | |
| <tr><td>10001-11000</td><td>17.6</td><td><div class="bar-container"><div class="bar green" style="width:70%"></div></div></td><td>Stable</td></tr> | |
| <tr><td>11001-12000</td><td>17.3</td><td><div class="bar-container"><div class="bar green" style="width:69%"></div></div></td><td>Stable</td></tr> | |
| <tr><td>12001-13000</td><td>19.1</td><td><div class="bar-container"><div class="bar yellow" style="width:76%"></div></div></td><td>Co-training bounce</td></tr> | |
| <tr><td>13001-14000</td><td>18.7</td><td><div class="bar-container"><div class="bar yellow" style="width:75%"></div></div></td><td>Recovering</td></tr> | |
| </table> | |
| <h2>Early vs Late Comparison</h2> | |
| <div class="metric-row"> | |
| <div class="metric"> | |
| <div class="value">23.1</div> | |
| <div class="label">Avg Gap (first 2K steps)</div> | |
| </div> | |
| <div class="metric good"> | |
| <div class="value">18.5</div> | |
| <div class="label">Avg Gap (steps 2K-14K)</div> | |
| </div> | |
| </div> | |
| <div class="metric-row"> | |
| <div class="metric"> | |
| <div class="value">6%</div> | |
| <div class="label">Steps with gap < 10 (early)</div> | |
| </div> | |
| <div class="metric good"> | |
| <div class="value">17%</div> | |
| <div class="label">Steps with gap < 10 (late)</div> | |
| </div> | |
| </div> | |
| <div class="metric-row"> | |
| <div class="metric"> | |
| <div class="value">0</div> | |
| <div class="label">Negative gaps (early)</div> | |
| </div> | |
| <div class="metric good"> | |
| <div class="value">19</div> | |
| <div class="label">Negative gaps (late)</div> | |
| </div> | |
| </div> | |
| <p>19 steps where the generator's output scored <em>higher</em> than the original human text. This never happened in the first 2000 steps.</p> | |
| <h2>Discriminator Co-training</h2> | |
| <p>The discriminator head is retrained every 2000 steps on fresh generator output. Rising loss = harder to classify.</p> | |
| <table> | |
| <tr><th>Step</th><th>Co-training Loss</th><th></th><th>Interpretation</th></tr> | |
| <tr><td>2,000</td><td>0.079</td><td><div class="bar-container"><div class="bar green" style="width:16%"></div></div></td><td>Easy to classify</td></tr> | |
| <tr><td>4,000</td><td>0.126</td><td><div class="bar-container"><div class="bar yellow" style="width:25%"></div></div></td><td>Harder</td></tr> | |
| <tr><td>6,000</td><td>0.261</td><td><div class="bar-container"><div class="bar red" style="width:52%"></div></div></td><td>Struggling to distinguish</td></tr> | |
| </table> | |
| <div class="insight">At co-training loss 0.26, the discriminator head is close to chance (0.5 = random guessing). The generator is producing text that the discriminator finds genuinely difficult to classify.</div> | |
| <h2>Reward Signal</h2> | |
| <table> | |
| <tr><th>Metric</th><th>Early (0-2K)</th><th>Late (2K-14K)</th><th>Change</th></tr> | |
| <tr><td>Mean reward</td><td><code>-12.2</code></td><td><code>-8.2</code></td><td><span class="tag success">+33%</span></td></tr> | |
| <tr><td>Positive rewards</td><td><code>1%</code> of steps</td><td><code>7%</code> of steps</td><td><span class="tag success">7x more</span></td></tr> | |
| <tr><td>BC loss</td><td><code>0.046</code></td><td><code>0.032</code></td><td><span class="tag success">-30%</span></td></tr> | |
| <tr><td>KL divergence</td><td><code>0.02</code></td><td><code>0.19</code></td><td><span class="tag partial">LoRA diverging</span></td></tr> | |
| </table> | |
| <h2>The 5-Day Bug</h2> | |
| <div class="card"> | |
| <h3>Runs 1-4: Flat Training (Apr 2-7)</h3> | |
| <p>Four full training runs showed zero improvement. The contrastive gap was perfectly flat at ~25 for 40,000+ cumulative steps across all runs. We tried:</p> | |
| <ul> | |
| <li>Raw logits instead of sigmoid compression</li> | |
| <li>Z-score normalized advantages instead of rank-based</li> | |
| <li>Behavior cloning on human text (bc_weight 0.1 and 1.0)</li> | |
| <li>KL anchoring to reference policy</li> | |
| <li>Slower co-training with replay buffer</li> | |
| </ul> | |
| <p>None of it worked.</p> | |
| <h3>Root Cause</h3> | |
| <p>vLLM's <code>/v1/load_lora_adapter</code> API returned HTTP 400. The response body said:</p> | |
| <p><code>"The lora adapter 'imposter-lora' has already been loaded. If you want to load the adapter in place, set 'load_inplace' to True."</code></p> | |
| <p>The fix was adding one field to the JSON payload. But the training code only logged <code>"400 Bad Request"</code> without reading the response body. The generator's weights were updating via backprop, but <strong>the updated weights never reached the vLLM server for generation</strong>. Generated text was always from the base model. The discriminator always saw the same distribution. The gap could not shrink.</p> | |
| </div> | |
| <h2>LoRA Serving: 9 Attempts</h2> | |
| <table> | |
| <tr><th>#</th><th>Approach</th><th>Result</th><th></th></tr> | |
| <tr><td>1</td><td>vLLM LoRA serving</td><td>45x slower</td><td><span class="tag fail">FAIL</span></td></tr> | |
| <tr><td>2</td><td>Merge + save 140GB + cron rsync</td><td>Too slow</td><td><span class="tag fail">FAIL</span></td></tr> | |
| <tr><td>3</td><td>Copy LoRA to C45 + CPU merge</td><td>Tokenizer corruption</td><td><span class="tag fail">FAIL</span></td></tr> | |
| <tr><td>4</td><td>Give up, serve base model</td><td>No training signal</td><td><span class="tag partial">GAVE UP</span></td></tr> | |
| <tr><td>5</td><td>GPU merge script on C45</td><td>0.6s merge, works</td><td><span class="tag success">OK</span></td></tr> | |
| <tr><td>6</td><td>Inline rsync + GPU merge</td><td>Implemented, superseded</td><td><span class="tag success">OK</span></td></tr> | |
| <tr><td>7</td><td>NCCL broadcast (14 hours)</td><td>RDMA version mismatch</td><td><span class="tag fail">FAIL</span></td></tr> | |
| <tr><td>8</td><td>LoRA hot-swap (no merge)</td><td>Silently broken (400)</td><td><span class="tag fail">FAIL</span></td></tr> | |
| <tr><td>9</td><td>Hot-swap + load_inplace</td><td>Works but 70x slow</td><td><span class="tag partial">SLOW</span></td></tr> | |
| </table> | |
| <p><strong>Final solution:</strong> Return to attempt 5. Merge LoRA into base weights on C45's GPUs (0.6s), save merged model (~4 min), restart vLLM without <code>--enable-lora</code>. Full base model speed: 320 tok/s.</p> | |
| <h2>Performance</h2> | |
| <div class="metric-row"> | |
| <div class="metric bad"> | |
| <div class="value">4</div> | |
| <div class="label">tok/s (LoRA serving)</div> | |
| </div> | |
| <div class="metric good"> | |
| <div class="value">320</div> | |
| <div class="label">tok/s (merged model)</div> | |
| </div> | |
| </div> | |
| <div class="metric-row"> | |
| <div class="metric"> | |
| <div class="value">17s</div> | |
| <div class="label">step time (LoRA)</div> | |
| </div> | |
| <div class="metric good"> | |
| <div class="value">4.5s</div> | |
| <div class="label">step time (merged)</div> | |
| </div> | |
| </div> | |
| <p>The LoRA serving slowdown was caused by vLLM 0.18's Punica BGMV kernels: 560 LoRA pairs (7 target modules × 80 layers) with TP-8 tensor parallelism. Known issue in vLLM's V1 engine.</p> | |
| <h2>Architecture</h2> | |
| <div class="card"> | |
| <p><strong>Training loop (GRPO + BC):</strong></p> | |
| <ul> | |
| <li>Generate 4 candidates via vLLM (merged model, C45)</li> | |
| <li>Score all candidates + human text with DeBERTa discriminator (GPU 7, C47)</li> | |
| <li>Z-score normalized advantages over generated candidates</li> | |
| <li>Policy gradient loss + KL anchor (beta=0.01) on generated candidates</li> | |
| <li>Behavior cloning loss (bc_weight=0.1) on human target text</li> | |
| <li>LoRA merge + vLLM restart every 500 steps</li> | |
| <li>Discriminator co-training every 2000 steps</li> | |
| </ul> | |
| </div> | |
| <h2>Artifacts</h2> | |
| <table> | |
| <tr><th>Artifact</th><th>Location</th></tr> | |
| <tr><td>Final LoRA</td><td><code>gs://volta-489906-artifacts/imposter/checkpoints/generator/lora_final/</code></td></tr> | |
| <tr><td>Step checkpoints</td><td><code>gs://volta-489906-artifacts/imposter/checkpoints/generator/lora_step_{2k,4k,8k,12k,14k}/</code></td></tr> | |
| <tr><td>Discriminator head</td><td><code>gs://volta-489906-artifacts/imposter/checkpoints/discriminator/co_trained_head.pt</code></td></tr> | |
| <tr><td>Training logs</td><td><code>gs://volta-489906-artifacts/imposter/logs/</code></td></tr> | |
| <tr><td>Code</td><td><code>github.com/belisarius222/imposter</code> (if pushed)</td></tr> | |
| </table> | |
| <h2>What's Next</h2> | |
| <ul> | |
| <li><strong>Reduce LoRA target modules</strong> — Research shows o_proj + gate_proj (2-3 modules) nearly as good as all 7 but 3.5x fewer parameters. Faster merge, faster training.</li> | |
| <li><strong>Reward Interpreter</strong> — GPT-Pro-designed architecture for richer gradient signal from discriminator hidden states. Sentence-level scoring instead of span-level.</li> | |
| <li><strong>Distributed training</strong> — Currently single-process with device_map=auto (12.5% GPU utilization). QLoRA+DDP is the identified best option.</li> | |
| <li><strong>Contrastive discriminator</strong> — Replace sigmoid with pairwise margin scoring for better reward signal.</li> | |
| </ul> | |
| </body> | |
| </html> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment