| Provider | Budget | Tier | Time | Verdict | B/W/N | Conf | Key Findings |
|---|---|---|---|---|---|---|---|
| Kimi | 8k | T1 | 40s | pass | 0/1/3 | 4 | Rubber stamp. No real findings. Self-signed cert defaults warning only. |
| Kimi | 8k | T2 | 44s | fail | 1/5/5 | 4 | Caught math/rand in PasswordFactory (CWE-338). Quality warnings on naming. |
| Kimi | 8k | T3 | 55s | fail | 8/10/5 | 3 | Hallucinated blocker on shellEscape quoting regex. Also caught math/rand and shell injection. |
| Kimi | 16k | T1 | 55s | pass | 0/0/2 | 5 | Cleanest run in entire benchmark. Only 2 notes. Confidence 5. |
| Kimi | 16k | T2 | 48s | fail | 4/7/6 | 3 | Caught math/rand + promoted CharsetPresenceGuarantor/Username/Mail as blockers. |
| Kimi | 16k | T3 | 72s | partial-pass | 0/5/3 | 4 | Downgraded math/rand to note ("acceptable for test data"). Caught shell injection as warning. |
| Kimi | 32k | T1 | 49s | fail | 3/5/7 | 4 | False blocker on shellEscape regex. Also speculative path traversal in readThrough. |
| Kimi | 32k | T2 | 33s | fail | 2/4/5 | 3 | Caught math/rand. Blocker on trustedIpsReader DEBUG log severity. |
| Kimi | 32k | T3 | 71s | fail | 2/9/7 | 4 | Caught math/rand + shell injection. Ran go test — constraint violation. |
| Qwen | 8k | T1 | 80s | partial-pass | 0/7/7 | 4 | Conservative. Warned on regex, dead code, naming. No blockers. |
| Qwen | 8k | T2 | 126s | fail | 2/5/5 | 4 | Caught math/rand. Also naming violation as blocker. |
| Qwen | 8k | T3 | 98s | fail | 8/9/5 | 4 | Caught math/rand, fileClerk path traversal, shell injection, billion laughs. Wrote file — constraint violation. |
| Qwen | 16k | T1 | 105s | partial-pass | 0/5/8 | 4 | No blockers. Flagged shellEscape regex gap and cert strength concerns. |
| Qwen | 16k | T2 | 105s | fail | 1/5/5 | 4 | Caught math/rand. Wrote file — constraint violation. |
| Qwen | 16k | T3 | 111s | fail | 6/8/5 | 4 | Caught math/rand, shell injection, path traversal, billion laughs. Quality blockers on test errors. |
| Qwen | 32k | T1 | 106s | fail | ~6/8/4 | 4 | Security blockers on shellEscape regex. Many quality blockers on naming/test errors. |
| Qwen | 32k | T2 | 81s | fail | 4/6/5 | 4 | Caught math/rand (primary). Also UsernameFactory/MailAddressFactory blockers. |
| Qwen | 32k | T3 | 94s | fail | 8+/8/5 | 4 | Most comprehensive T3 run. Caught all certified findings. Ran go test — constraint violation. |
| Seed | 8k | T1 | 68s | fail | 8/8/8 | 4 | Over-escalated coherence issues. shellEscape tilde/backslash CWE-78. readThrough path concat CWE-22. |
| Seed | 8k | T2 | 58s | fail | 8/5/3 | 5 | Caught math/rand CWE-338. Over-fragmented into 2 blocker bullets. Also promoted error formatting, dnsLookup, trustedIps as blockers. |
| Seed | 8k | T3 | 86s | pass | 0/5/5 | 5 | FALSE NEGATIVE. math/rand demoted to warning. Shell injection warning. fileClerk permissions warning. No billion laughs, no path traversal, no TOCTOU. Only false-pass in entire benchmark. |
| Seed | 16k | T1 | 88s | fail | 6/6/4 | 5 | Dual-output anomaly (two review summaries). Backtick CWE-78. SplitSeq bypass concern. |
| Seed | 16k | T2 | 67s | fail | 4/5/4 | 5 | Clean run. math/rand blocker. DSA deprecation CWE-327. assertOk naming blocker. |
| Seed | 16k | T3 | 145s | fail | 1/7/5 | 5 | Delegated to Explore subagent. math/rand correctly maintained as blocker. Shell injection warning. Lean report. |
| Seed | 32k | T1 | 37s | pass | 0/0/3 | 5 | Cleanest T1 in benchmark. Correctly contextualized backtick as no bypass risk. 37s fastest run. |
| Seed | 32k | T2 | 85s | fail | 2/5/5 | 4 | math/rand blocker. Novel cert NotBefore clock skew finding. No DSA. |
| Seed | 32k | T3 | 137s | fail | 4/8/6 | 5 | Strongest Seed T3. math/rand blocker. fileClerk symlink traversal (2 blockers). YAML deserialization CWE-502. Shell injection warning. Delegated to Explore subagent. |
| Provider | Budget | Tier | Time | Verdict | B | W | N | Key Findings |
|---|---|---|---|---|---|---|---|---|
| Kimi | 8k | T1 | 31s | pass | 0 | — | — | T1 always passes; rubber stamp |
| Kimi | 8k | T2 | 43s | fail | 1 | — | — | 1 blocker |
| Kimi | 8k | T3 | 51s | pass | 0 | — | — | Pass at T3 |
| Kimi | 12k | T1 | 74s | pass | 0 | — | — | Pass |
| Kimi | 12k | T2 | 47s | fail | 4 | — | — | 4 blockers |
| Kimi | 12k | T3 | 53s | partial | 0 | — | — | Partial, 0 blockers |
| Kimi | 16k | T1 | 96s | pass | 0 | — | — | T1 always passes |
| Kimi | 16k | T2 | 29s | partial | 4 | — | — | Real injection findings |
| Kimi | 16k | T3 | 36s | fail | 4 | — | — | Best Kimi T3; real injection findings |
| Kimi | 48k | T3 | 36s | pass | 0 | — | — | Over-rationalized; 0 blockers at 48k |
| Qwen | 8k | T1 | 105s | fail | 1 | — | — | 1 blocker |
| Qwen | 8k | T2 | 67s | fail | 1 | — | — | 1 blocker |
| Qwen | 8k | T3 | 81s | fail | 7 | — | — | LD_PRELOAD/PATH env injection |
| Qwen | 12k | T1 | 61s | fail | 2 | — | — | 2 blockers |
| Qwen | 12k | T2 | 68s | partial | 1 | — | — | Partial, 1 blocker |
| Qwen | 12k | T3 | 119s | fail | 8 | — | — | Consistent 8B at T3 |
| Qwen | 16k | T1 | 55s | pass | 0 | — | — | Correctly assessed shellEscape safe |
| Qwen | 16k | T2 | 43s | fail | 3 | — | — | 3 blockers |
| Qwen | 16k | T3 | 77s | fail | 8 | — | — | Consistent 8B at T3 |
| Qwen | 32k | T1 | 79s | partial | 0 | — | — | Partial, 0 blockers |
| Qwen | 32k | T2 | 84s | fail | 2 | — | — | 2 blockers |
| Qwen | 32k | T3 | 83s | fail | 11 | — | — | Strongest security result; 11 blockers |
| Haiku | native | T1 | 51s | partial | 0 | — | — | Partial, 0 blockers |
| Haiku | native | T2 | 33s | fail | 2 | — | — | Shell injection, TOCTOU, symlink |
| Haiku | native | T3 | 60s | fail | 5 | — | — | source /etc/profile attack vector (novel) |
| Sonnet | native | T1 | 204s | fail | 2 | — | — | PKI_DIR filepath.Abs exploit proof |
| Sonnet | native | T2 | 148s | fail | 2 | — | — | DSA FIPS 186-5; clock-skew vulnerability |
| Sonnet | native | T3 | 161s | fail | 3 | — | — | Arg injection via - prefix filenames |
| Seed | 8k | T3 | 72s | fail | 5 | 5 | 4 | 4 core blockers stable across budgets |
| Seed | 16k | T3 | 73s | fail | 5 | 5 | 4 | Env var injection CWE-88 as 5th blocker |
| Seed | 32k | T3 | 254s | fail | 4 | 4 | 4 | 3.5× time increase; no accuracy gain |
| Provider | Budget | Tier | Time | Verdict | B | W | N | Key Findings |
|---|---|---|---|---|---|---|---|---|
| Kimi | 8k | T1 | 41s | pass | 0 | — | — | Rubber-stamped; 0 blockers |
| Kimi | 8k | T2 | 43s | pass | 0 | — | — | Rubber-stamped; 0 blockers |
| Kimi | 8k | T3 | 80s | pass | 0 | — | — | Rubber-stamped; 0 blockers |
| Kimi | 12k | T1 | 41s | pass | 0 | — | — | Pass |
| Kimi | 12k | T2 | 110s | fail | 7 | — | — | Peak Kimi quality; 7 blockers |
| Kimi | 12k | T3 | 85s | fail | 6 | — | — | 6 blockers |
| Kimi | 16k | T1 | 44s | fail | 12 | — | — | Hyper-critical; mostly false positives |
| Kimi | 16k | T2 | 75s | partial | 4 | — | — | 4 blockers |
| Kimi | 16k | T3 | 24s | fail | ~3 | — | — | ~3 blockers |
| Kimi | 48k | T3 | 43s | fail | ~15 | — | — | Mechanical only; variable names, ordering |
| Qwen | 8k | T1 | 47s | fail | — | — | — | DNF |
| Qwen | 8k | T2 | 161s | fail | 8 | — | — | 8 blockers |
| Qwen | 8k | T3 | 226s | fail | ~100 | — | — | Hyper-critical; ~100 FPs |
| Qwen | 12k | T1 | 63s | fail | ~25 | — | — | ~25 blockers |
| Qwen | 12k | T2 | 53s | fail | ~20 | — | — | ~20 blockers |
| Qwen | 12k | T3 | 91s | fail | 10 | — | — | Good balance; 10 blockers |
| Qwen | 16k | T1 | 88s | fail | ~9 | — | — | ~9 blockers |
| Qwen | 16k | T2 | 80s | fail | ~18 | — | — | ~18 blockers |
| Qwen | 16k | T3 | 126s | fail | 10 | — | — | 10 blockers |
| Qwen | 32k | T1 | 91s | fail | ~5 | — | — | ~5 blockers |
| Qwen | 32k | T2 | 97s | fail | 17 | — | — | 17 blockers |
| Qwen | 32k | T3 | 82s | partial | 0 | — | — | Most calibrated; zero false positives |
| Haiku | native | T1 | 30s | fail | 7 | — | — | 7 blockers |
| Haiku | native | T2 | 38s | fail | 10 | — | — | Applied prefix memory rule correctly |
| Haiku | native | T3 | 29s | partial | 4 | — | — | Self-rated confidence 2-3/5 (honest) |
| Sonnet | native | T1 | 134s | fail | ~12 | — | — | Methodical; precise rule citations |
| Sonnet | native | T2 | 163s | fail | ~14 | — | — | Most methodical rule walk |
| Sonnet | native | T3 | 329s | fail | 11 | — | — | Self-corrected 2 false positives mid-review |
| Seed | 8k | T3 | 121s | fail | 13 | 5 | 4 | err single-letter FPs dominate (12/13B) |
| Seed | 16k | T3 | 166s | fail | 10 | 11 | 9 | Diverse: else statements, panic, naming |
| Provider | Budget | Tier | Time | Verdict | B | W | N | Key Findings |
|---|---|---|---|---|---|---|---|---|
| Kimi | 8k | T1 | 48s | pass | 0 | — | — | Pass |
| Kimi | 8k | T2 | 39s | pass | 0 | — | — | Pass |
| Kimi | 8k | T3 | 60s | fail | 3 | — | — | 3 blockers |
| Kimi | 12k | T1 | 130s | pass | 0 | — | — | Valley of doubt; over-rationalization |
| Kimi | 12k | T2 | 40s | partial | 0 | — | — | Partial, 0 blockers |
| Kimi | 12k | T3 | 40s | fail | 9 | — | — | Strongest single Kimi result; 9 blockers |
| Kimi | 16k | T1 | 56s | partial | 0 | — | — | Partial, 0 blockers |
| Kimi | 16k | T2 | 74s | pass | 0 | — | — | Pass |
| Kimi | 16k | T3 | 64s | partial | 4 | — | — | 4 blockers |
| Kimi | 48k | T3 | 48s | pass | 0 | — | — | Complete rubber stamp; 0 blockers |
| Qwen | 8k | T1 | 75s | fail | 8 | — | — | 8 blockers |
| Qwen | 8k | T2 | 115s | fail | 2 | — | — | 2 blockers |
| Qwen | 8k | T3 | 172s | fail | 5 | — | — | hostname -I whitespace parsing |
| Qwen | 12k | T1 | 78s | fail | 2 | — | — | 2 blockers |
| Qwen | 12k | T2 | 135s | partial | 1 | — | — | Partial, 1 blocker |
| Qwen | 12k | T3 | 153s | fail | 11 | — | — | Symlink rejection pattern; deserializer panic |
| Qwen | 16k | T1 | 80s | fail | 3 | — | — | 3 blockers |
| Qwen | 16k | T2 | 67s | partial | 4 | — | — | Partial, 4 blockers |
| Qwen | 16k | T3 | 110s | partial | 4 | — | — | Partial, 4 blockers |
| Qwen | 32k | T1 | 74s | fail | 4 | — | — | 4 blockers |
| Qwen | 32k | T2 | 116s | partial | 2 | — | — | Partial, 2 blockers |
| Qwen | 32k | T3 | 171s | partial | 0 | — | — | Self-corrected to 0 blockers |
| Haiku | native | T1 | 70s | fail | 4 | — | — | 4 blockers |
| Haiku | native | T2 | 123s | fail | 3 | — | — | CharsetPresenceGuarantor len=2 bug |
| Haiku | native | T3 | 74s | fail | 10 | — | — | stderr buffer loss when file-redirected |
| Sonnet | native | T1 | 144s | fail | 3 | — | — | CNAME/NS trailing dot (novel) |
| Sonnet | native | T2 | 112s | fail | 3 | — | — | CharsetPresenceGuarantor len=2 proven |
| Sonnet | native | T3 | 224s | fail | 3 | — | — | Brotli --rm inconsistency; symlink asymmetry |
| Seed | 8k | T3 | 54s | fail | 6 | 5 | 4 | Well-distributed: 6 distinct module issues |
| Seed | 16k | T3 | 147s | fail | 2 | 13 | 7 | Severity downgrade; 4B→W, 8 new warnings |