Both v27 attempts failed for the same root cause. Here's the deep analysis:
Attempt 1: edit → eval (score 7.1, adherence 3.0) → retry Attempt 2: edit → eval (score 7.6, adherence 4.0) → fail
Both times the evaluator flagged Beat 6 (KEEP) and Beat 7 (KEEP) as critical instruction_adherence failures (scores 3-4/10). The instruction said "Move beats 6-7 up immediately after beat 3 sequence" but those beats stayed in place.
The beat plan correctly models the reorder:
- Beat 4 (EDIT) ← original beat 6
- Beat 5 (EDIT) ← original beat 7
- Beat 6 (KEEP) ← original beat 4
- Beat 7 (KEEP) ← original beat 5
build_beat_assignments_from_plan maps each new beat to its original paragraph range. The assignments are sorted by new beat_number. But execute_beat_actions edits paragraphs in-place at their original positions — it never physically moves paragraphs from one position to another.
So what actually happens:
- Beat 4 (EDIT, orig 6's paragraphs at position ~60-70) gets edited
- Beat 5 (EDIT, orig 7's paragraphs at position ~70-80) gets edited
- Beat 6 (KEEP, orig 4's paragraphs at position ~30-40) gets continuity check only
- Beat 7 (KEEP, orig 5's paragraphs at position ~40-50) gets continuity check only
The paragraphs never move. The reordering only exists in the beat_number labels, not in the actual paragraph positions in the text.
The unified_writer processes beats in beat_number order but edits paragraphs at their paragraph_start:paragraph_end positions. It has no concept of "move these paragraphs from position A to position B." KEEP means "run a continuity check at the original position," not "accept this beat and place it in the new order."
Fix needed: When beat_number order differs from paragraph position order (i.e., the plan reorders beats), the writer needs to physically reorder paragraphs. After all individual beat edits complete, reassemble the working array in beat_number order instead of paragraph-position order.
The evaluator prompt says: "If the beat's action is KEEP but the instruction SHOULD have changed this beat's content, score 1-4." Beats 6 and 7 are KEEP, so the evaluator correctly says: "the instruction asked to move these earlier, but they're still here in the wrong position — score 3-4."
But the writer CAN'T move them — KEEP means "don't touch the content." The evaluator is judging the writer for a capability the writer doesn't have. The planner assigned KEEP to these beats because their content doesn't need changing, just their position. The evaluator conflates position with content.
Fix needed: The evaluator should not judge KEEP beats for positional failures. If the beat content is correct but in the wrong position, that's a structural issue the evaluator shouldn't penalize at the beat level.
On retry (attempt 2), the evaluator feedback says "Move beats 6-7 immediately after beat 3." This feedback is appended to the instruction. But the retry runs through the same pipeline with the same beat_assignments (KEEP for beats 6/7). The writer still can't reorder, so attempt 2 fails identically.
The retry mechanism assumes the editor can "try harder" given feedback, but when the failure is architectural (can't reorder), retrying is pure waste.
In _synthesize_beat_results, if ANY beat has adherence < 6 (MIN_ADHERENCE), it forces the entire evaluation to fail:
if adherence_failures:
min_failing = min(f["score"] for f in adherence_failures)
adherence_score = min(adherence_score, min_failing)So one KEEP beat with adherence=3 drags the whole adherence score to 3.0, guaranteeing failure. Even though the actual edits (REPLACE beats 2-3, EDIT beats 4,5,9,12) might be excellent.
Attempt 1: 108 → 128 paragraphs Attempt 2: 128 → 155 paragraphs (starts from attempt 1's output!)
The retry mechanism uses _apply_edits_to_job_doc to update job_doc with attempt 1's output, so attempt 2 starts from 128 paragraphs. But the beat_assignments still reference the original paragraph ranges. The REPLACE/EDIT actions generate new content from the already-edited text, causing paragraph count to balloon. Two REPLACEs went from 8→14→30 paragraphs.
- Add REORDER capability to the writer — after all beat edits, if beat order ≠ paragraph position order, reassemble paragraphs in beat_number order
- Evaluator: don't penalize KEEP beats for structural/positional issues — only check content continuity for KEEP, not instruction adherence
- Retry: detect architectural impossibilities — if the failure is "can't reorder" or "can't add/remove beats," don't retry
- Retry: rebuild beat_assignments from scratch using the updated paragraph positions, not the original ranges