Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save jeremylongshore/572447584d22377b76bbbda93919d413 to your computer and use it in GitHub Desktop.

Select an option

Save jeremylongshore/572447584d22377b76bbbda93919d413 to your computer and use it in GitHub Desktop.
audit-harness — @intentsolutions/audit-harness one-pager + operator audit + changelog

@intentsolutions/audit-harness

Deterministic test-enforcement toolkit — hash-pinned policy, escape-scan refusal gates, CRAP scoring, architecture checks, and signed evidence emission, installed in the repo it governs.

A zero-runtime-dependency CLI (Node dispatcher + bash/Python scripts) that makes test-quality policy enforceable on fresh clones. Companion to the audit-tests and implement-tests Claude Code skills, and the deterministic-gates layer of the Intent Eval Platform — but usable standalone in any repo.

npm License Provenance

Links:


One-Pager

Problem

AI-assisted repositories need enforcement that travels with the code. A coding agent (or a hurried human) can lower a coverage threshold, delete a failing test, or relax an architecture rule in the same diff that adds a feature — and review does not reliably catch it. Quality gates that live on one developer's machine, or in a globally-installed skill, do not reproduce on a fresh clone, so the policy is unenforceable exactly where it matters: in someone else's CI.

Solution

Install the harness in the target repo and pin the policy. Engineer-owned files (tests/TESTING.md, feature files, architecture configs, mutation configs) are hashed into a committed .harness-hash manifest. Any diff that changes their content without a fresh, deliberate audit-harness init is caught at pre-commit and in CI and REFUSED. Around that core, the CLI ships deterministic gates (escape-scan, CRAP, architecture, bias, Gherkin lint) and read-only audit verbs (classify, conform, audit, scan, currency) that emit machine-readable gate-result/v1 rows suitable for signing as in-toto evidence.

The rule the harness enforces: policy changes must be conscious, not silent. AI agents stay useful — they can read policy and implement within constraints. What they cannot do is silently weaken the constraints.

W5

Question Answer
Who Teams running AI-assisted development that need test policy enforced in-repo; consumed by the Intent Eval Platform repos and the audit-tests / implement-tests Claude Code skills
What A 16-command CLI: hash-pin (verify/init/list), escape-scan, arch, bias, gherkin-lint, crap, emit-evidence, plus read-only audit verbs classify, conform, audit, scan, fp-rate, currency, gen-layer-applicability
When Pre-commit hooks (escape-scan + verify), CI on every PR (verify + range escape-scan), nightly self-check; audit verbs on demand
Where Installed in the target repo via npm, PyPI, crates.io, or curl-vendored into .audit-harness/ for any other ecosystem
Why Enforcement must travel with the code — fresh clones reproduce every gate, and thresholds are read from the repo's own tests/TESTING.md, never hardcoded

Stack

Layer Technology
CLI dispatcher Node ≥ 18, zero runtime dependencies (bin/audit-harness.js, thin spawn wrapper)
Gate scripts Bash + Python 3 stdlib (scripts/*.sh, scripts/*.py) — the canonical implementations
Schemas JSON Schema: audit-profile/v1 + canonical gate registry, conform/v1 content-addressed artifact schemas, currency pin relation
Evidence in-toto Statement v1 envelope; gate-result/v1 predicate (canonical schema in @intentsolutions/core)
Distribution npm (@intentsolutions/audit-harness) + PyPI (intent-audit-harness) + crates.io (intent-audit-harness) + install.sh vendoring
CI GitHub Actions — 3-Node matrix (18/20/22), pinned shellcheck v0.10.0 + ruff 0.15.4, nightly cron
Release Tag-triggered release.yml, npm publish --provenance (sigstore, OIDC)

Differentiators

  • Refuse-at-pre-commit escape detection. escape-scan classifies diff hunks against an escape grammar: exit 0 clean, exit 1 CHALLENGE (requires an engineer-approved comment), exit 2 REFUSE (pipeline halted). Threshold-lowering, test deletion, and architecture-rule bypass are caught before they merge.
  • Hash-pinned policy. Pinned files changed without re-init → HARNESS_TAMPERED (exit 2) blocks the PR. The harness self-pins its own scripts and dispatcher via .harness-hash-extra-patterns, so a silent edit to the gate that does the refusing is itself refused.
  • Golden-master suites. The raw stdout of gherkin-lint and crap-score --json is pinned byte-stable against a deliberate-failure corpus — a silent reshape of scorer output (renamed field, dropped WARN line) fails CI even when the per-row schema gate cannot see it.
  • Reproducible evidence by design. conform validates artifacts against schemas bundled in the harness (never live-fetched), records each schema's sha256 in the row's policy_hash, and uses an embedded subset validator instead of ajv so the same commit + same harness version always produce an identical verdict.
  • Read-only audit verbs. classify/conform/audit/scan/currency never write to the target repo, and audit deliberately does not execute the repo's test suite — it reports test-layer presence; execution verdicts stay in the repo's own CI.
  • Safety levers. AUDIT_HARNESS_DISABLE=1 kill-switch, AUDIT_HARNESS_TIMEOUT per-command supervision (hung gate killed, exit 124), and an engineer-owned .audit-harness.yml per-repo override.

Operator-Grade System Analysis

Architecture: Node dispatcher + polyglot scripts

The design rule is explicit: scripts are the source of truth. bin/audit-harness.js is a ~200-line dispatcher that resolves a command name to a canonical bash or Python implementation in scripts/ and spawns it with inherited stdio. There is no TypeScript port and no npm dependency — dependencies and devDependencies are empty, and the published tarball contains only bin/, scripts/, schemas/, docs/, and the license/changelog files. This is what makes the same CLI surface shippable identically through npm, PyPI, crates.io, and a curl-vendored .audit-harness/ tree.

Command surface (16 commands):

Command Script Purpose
verify / init / list harness-hash.sh Verify / (re-)pin / enumerate the hash-pinned policy manifest
escape-scan escape-scan.sh Scan a diff (--staged, --range A..B, stdin, patch file) for escape attempts
arch arch-check.sh Language-appropriate architecture-rule checker (dependency-cruiser / import-linter / ArchUnit / deptrac / arch-go)
bias bias-count.sh Count test-bias patterns (tautology, smoke-only, …)
gherkin-lint gherkin-lint.sh Advisory Gherkin quality check
crap crap-score.py CRAP (complexity × coverage) scorer — Python, Go, JS/TS, Rust backends
emit-evidence emit-evidence.sh Wrap a gate-result JSON envelope in an in-toto Statement v1 (predicate https://evals.intentsolutions.io/gate-result/v1)
classify classify.py Read-only repo classifier → audit-profile/v1 (UNION of detected classifications + resolved gate set, registry_hash recorded)
conform conform.py Read-only conformance gate-runner → gate-result/v1 rows against bundled content-addressed schemas
audit audit.py Read-only testing-depth gate-runner — per-pyramid-layer presence heuristics; --deep adds crap-score
scan scan.py Read-only security/hygiene/skill-quality gate-runner — shells out to gitleaks / osv-scanner / semgrep / syft / markdownlint / lychee; missing tool → ADVISORY indeterminate, never a false FAIL
fp-rate fp-rate.py Measures per-gate false-positive/false-negative rate over a labeled corpus — the metric gating advisory→blocking promotion (≤ 5% bar)
currency currency.py Advisory upstream-currency report from a per-upstream pin relation; no exit-code authority (always exit 0), no live-fetch
gen-layer-applicability gen-layer-applicability.py Projects the canonical gate registry into layer-applicability.md; --check fails CI on drift

All gates support --json for machine-readable gate-result envelopes pipeable to emit-evidence. The schemas/ directory ships in the package: audit-profile/ (the v1 schema, the canonical registry.v1.json gate registry, and its generated projection), conform/v1/ (skillmd-frontmatter, mcp-config, plugin-manifest, agent-frontmatter), and currency/pins.v1.json.

Exit codes (CI contract):

Exit Command Meaning
0 any Clean
1 escape-scan CHALLENGE — engineer-approved comment required
2 verify HARNESS_TAMPERED — pinned file changed without re-init
2 escape-scan REFUSE — pipeline halted
3 verify No manifest (fresh repo, not an error)
124 any gate Killed by AUDIT_HARNESS_TIMEOUT supervision (INDETERMINATE)

Within the 7-layer testing taxonomy the harness serves L1 (escape-scan in pre-commit + CI — the cheapest gate) and L3 (CRAP, architecture, bias, hash-pin — code-level correctness).

Operational reference

Install — pick the ecosystem:

# Node / JS / TS
pnpm add -D @intentsolutions/audit-harness     # or npm install --save-dev / yarn add --dev

# Python
pip install intent-audit-harness

# Rust
cargo install intent-audit-harness

# Anything else (Go, Ruby, PHP, Java, .NET, shell) — vendor the scripts
curl -sSL https://raw.githubusercontent.com/jeremylongshore/intent-audit-harness/main/install.sh | bash

The vendored path copies the scripts, the Node dispatcher, NOTICE (Apache-2.0 § 4(d) — it must travel with any distribution), and a PROVENANCE file recording source repo, version, tarball URL, and install timestamp.

Pre-commit hook:

#!/usr/bin/env sh
pnpm exec audit-harness escape-scan --staged
pnpm exec audit-harness verify

CI step:

- run: pnpm exec audit-harness verify
- run: pnpm exec audit-harness escape-scan --range origin/main..HEAD

Changing a policy threshold (the conscious path):

# 1. Edit tests/TESTING.md (e.g., coverage.line 80 → 75)
# 2. Re-init to accept the change
pnpm exec audit-harness init
# 3. Commit the regenerated manifest alongside the policy change
git add tests/TESTING.md .harness-hash
git commit -m "chore(test): lower coverage floor to 75"

Language support:

Language CRAP Arch
Python radon + coverage.py import-linter
JS/TS complexity-report + c8 dependency-cruiser
Go gocyclo + go test -cover arch-go
Rust rust-code-analysis + tarpaulin (custom)
Java/Kotlin / .NET / PHP ArchUnit / ArchUnitNET / deptrac

Security posture

  • License + distribution integrity. Apache-2.0 since v1.0.0 (0.x releases remain MIT under their original terms). A version-canonical-check CI gate enforces that all polyglot manifests (package.json canonical, version.txt, python/pyproject.toml, __init__.py, rust/Cargo.toml, Cargo.lock) agree on both version and Apache-2.0 license — drift fails the build.
  • Signed releases. npm publishing runs only through the tag-triggered release.yml with id-token: write and npm publish --provenance (sigstore: Fulcio OIDC + Rekor). The workflow verifies the pushed tag matches package.json#version before publishing.
  • Self-containment. The harness's own enforcement surface (scripts/*.sh, scripts/*.py, bin/audit-harness.js, and the extra-patterns file itself) is hash-pinned at the repo root; CI hard-fails harness-hash --verify on any byte change without a committed re-init. The harness tests itself: every PR's diff passes through its own escape-scan.
  • Pinned toolchain. shellcheck (v0.10.0) and ruff (0.15.4) are version-pinned in CI so a linter release cannot surface new findings silently against merged code.
  • No network at gate time. conform never live-fetches schemas; verdicts re-verify against the exact bundled schema version via policy_hash. currency — the one verb whose subject is inherently network-bound (upstream freshness) — is deliberately advisory-only with no exit-code authority.
  • Demonstrated catch. The scan verb's first run against this repo's own tree caught a PyPI publish token pasted as a literal in a docs file; HEAD was redacted in the same release and the rotation tracked separately (documented in the changelog — the gate worked on its own author).
  • Signed self-evidence. A CI-only emitter (ci/, excluded from the published tarball to preserve the zero-dependency guarantee) runs the real harness-hash --verify self-gate, shapes it into a kernel gate-result/v1 + Evidence Bundle, cosign-signs the canonical bytes, and publishes a manifest as a Release asset for downstream re-verification.

Current state (2026-06-12)

  • v1.1.7 on npm with sigstore provenance; tags through v1.1.7. GitHub repo renamed audit-harnessintent-audit-harness (npm package name, CLI binary, and PyPI/crates package names unchanged).
  • 13 deterministic CI suites are required checks on main. The CI workflow (ci.yml, push + PR + nightly cron) covers: dispatcher self-check across Node 18/20/22, shellcheck, ruff, Python compile-check, manifest version/license alignment, backward-compat regression, a SemVer contract suite (frozen subcommand roster, exit codes, --json stream contract, predicate URI), golden suites for classify/conform/audit/scan/currency, a crap-score join-regression suite with stubbed providers, the golden-master stdout suite, layer-applicability projection drift, and advisory FP-rate measurement.
  • Fresh fixes (2026-06-11, PR #66): the Go and JS CRAP scorer backends were repaired — kind='test' no longer passes a gocyclo ignore pattern that silently emptied the test CRAP gate; the go.mod module prefix is stripped from go tool cover paths so coverage joins gocyclo's repo-relative complexity paths; absolute c8/istanbul coverage keys are normalized repo-relative to join complexity-report paths. The CI escape-scan self-check now captures the script's own exit code (previously tee masked it, making the REFUSE branch unreachable). A stub-driven regression suite (tests/crap-score/) fails pre-fix on all three findings without requiring gocyclo/go installs. Earlier in the cycle: evidence-integrity fixes + cross-platform SHA256 + kernel schema URL (PR #64) and the SemVer contract suite (PR #65).
  • Platform role: this repo is the deterministic-gates layer of the Intent Eval Platform — it emits gate-result/v1 rows into the shared Evidence Bundle schema (canonical contracts in @intentsolutions/core); downstream tooling consumes them.

Changelog

Source: CHANGELOG.md (Keep a Changelog format), verbatim — the [Unreleased] section plus the six most recent headered releases. Releases v1.1.6 and v1.1.7 are tagged and published; their changes are recorded under [Unreleased] below pending the next header roll-up (last headered release: v1.1.5).

[Unreleased]

Added — golden-master suite for gherkin-lint + crap-score stdout shapes (iah-golden-master)

A fitness function that pins the raw stdout of the two scorers whose output is a downstream contract.

  • tests/golden/run-golden.sh captures gherkin-lint.sh (text rubric) and crap-score.py --json (gate-result envelope) stdout against a tests/fixtures/deliberate-failure/ corpus and diffs each against a checked-in golden, failing on any drift. Environment-volatile bytes are normalized out (gherkin-lint's installed-vs-awk-fallback first line; crap-score's absolute summary_path) so the golden is byte-stable across machines. CI installs no complexity provider, so the crap golden captures the deterministic no-provider envelope shape.
  • Why this and not the per-row schema gate: the schema gate validates the augmented predicate that emit-evidence produces, not the raw scorer stdout. A silent reshape of the scorer stdout — a renamed field, a dropped WARN line, changed summary wording — is a backward-compat break the schema gate cannot see. This suite is that missing guard.
  • Regenerate intentional changes with bash tests/golden/run-golden.sh --update and review the golden diff in the PR. Wired into .github/workflows/ci.yml as the golden job.

Changed — install.sh vendors NOTICE + the Node dispatcher (iah-install-sh-completeness)

The vendored-install path (non-Node repos) now ships a complete, traceable copy.

  • NOTICE is copied into .audit-harness/ — Apache-2.0 §4(d) requires the NOTICE file to travel with any distribution, and vendoring is a distribution.
  • bin/audit-harness.js (the Node CLI dispatcher) and package.json are copied into .audit-harness/bin/ + .audit-harness/ so the canonical dispatcher surface is present and its --version (which reads ../package.json) resolves in the vendored tree.
  • A PROVENANCE file records the source repo, version, tarball URL, and install timestamp so a vendored tree is traceable back to the exact release it came from.

Added — CI-only signed evidence emit for the intent-eval-dashboard (nr75.12)

The dashboard reports hub (labs.intentsolutions.io) ingests a signed report-manifest.json of kernel gate-result/v1 rows per repo. This adds audit-harness's own emit, lighting up its row.

  • ci/emit-evidence.ts + ci/assemble-manifest.ts — run the real deterministic self-gate (harness-hash --verify), shape it into a kernel gate-result/v1 + EvidenceBundle (fail-closed against @intentsolutions/core), cosign-sign the canonical bytes (Fulcio OIDC + Rekor), and assemble the manifest the dashboard re-verifies at ingest.
  • Zero-dep guarantee preserved. The emitter lives in ci/ (excluded from package.json#files) and the kernel is installed CI-only via npm i --no-savedependencies + devDependencies stay empty and the published tarball is unchanged (verified via npm pack --dry-run).
  • .github/workflows/release.yml — adds a GitHub Release on tag push + an emit-evidence job (tag-only) that publishes the manifest as a Release asset.

Added — currency advisory upstream-currency report (PP-PLAN-040 Phase 5 / E7)

The fifth verb, and deliberately the weakest: an advisory report with no exit-code authority.

  • audit-harness currency (scripts/currency.py, stdlib): reads the per-upstream-identity pin relation (schemas/currency/pins.v1.json) and reports which pins are themselves stalechecked_at older than the pin's staleness window. Each upstream (mcp-spec, skill-md-schema, claude-code, gate-result-predicate, anthropic-sdk, agentskills-spec) carries its own pinned_version + checked_at + window, so the pin's own staleness is detectable (not one opaque scalar).
  • No exit-code authority (always exit 0), no live-fetch, no auto-fix. Currency depends on upstream state — non-deterministic and network-bound — so it only reports. /sync-testing-harness consumes the report to open advisory bump PRs; it never reddens a build. --today YYYY-MM-DD makes reports reproducible.
  • tests/currency/: golden suite (3 checks) — stale/current/unknown classification, the no-exit-authority guarantee (exit 0 even when all pins are stale), and the shipped relation reporting.

Added — scan security/hygiene/skill-quality gate-runner (PP-PLAN-040 Phase 4 / E6)

The fourth read-only verb: security + hygiene + skill-quality, by orchestrating standard tools (never reimplementing them).

  • audit-harness scan [repo] (scripts/scan.py, stdlib): for every dimension: security | hygiene | skill-quality gate in the profile, emits a gate-result/v1 row. Three strategies: local (hygiene-readme README presence — deterministic), shell-out (every gate carrying a tool — gitleaks / osv-scanner / semgrep / syft / markdownlint / lychee — clean exit → PASS, findings → ADVISORY(error), tool absent → ADVISORY indeterminate), consume (skill-behavioral ingests a j-rig Evidence Bundle verdict via --jrig-verdict; the harness never runs behavioral judgment itself — no verdict → indeterminate).
  • Advisory-first; --strict (or a blocking gate) turns a finding/gap into FAIL. Kill-switch → []. Each row records metadata.method (local-presence / shell-out / consume-j-rig).
  • tests/scan/: golden suite (10 checks) with pinned-profile isolation so shell-out tool availability never makes the suite flaky.

Security note: on first run this gate caught — and this release redacts from HEAD — a PyPI publish token that had been pasted as a literal value in python/PUBLISH.md. The value remains in git history; it must be rotated at the registry (tracked separately). The doc now carries a placeholder.

Added — audit testing-depth gate-runner (PP-PLAN-040 Phase 3 / E5)

The third read-only verb: the "finish the pyramid" testing-depth diagnostic.

  • audit-harness audit [repo] (scripts/audit.py, stdlib): for every dimension: testing-depth gate in the profile, assesses the gate and emits a gate-result/v1 row. Two read-only strategies: crap-score runs the bundled crap scorer (static complexity×coverage); every pyramid layer (unit/integration/e2e/smoke/perf/a11y/contract/migration/property-based/fuzz/sanitizers) gets a per-layer presence heuristic (test dirs, framework configs, dependency markers). Layer present → PASS; absent → ADVISORY(warn) testing-depth gap; not statically assessable → ADVISORY indeterminate.
  • --fast (default) presence heuristics only (<10s); --deep adds crap-score; --strict turns a gap on a blocking gate into FAIL. Kill-switch → []. Each row records metadata.method (crap-static / presence-heuristic / delegated) for provenance.
  • Deliberately does NOT execute the repo's test suite. Running arbitrary untrusted suites is the repo's own CI's job; the harness reports coverage presence and the repo's CI test step produces the execution verdict. audit is the diagnostic, not the test runner.
  • tests/audit/: golden suite (7 checks) + has-tests/no-tests fixtures — asserts unit→PASS / gap→ADVISORY(default) / gap→FAIL(--strict), crap deep-only-in-fast, kill-switch, and gate-result/v1 validity. CI audit job.

Added — registry projection + FP-rate harness (PP-PLAN-040 Phase 0 completion: c2b + c2e)

Closes the data/safety-spine epic (E2): the registry becomes the single canonical datum and gate promotion gets a measured bar.

  • audit-harness gen-layer-applicability (scripts/gen-layer-applicability.py): projects schemas/audit-profile/registry.v1.json into schemas/audit-profile/layer-applicability.md. --write regenerates; --check fails on drift. The doc is now a projection of the registry datum, not a hand-maintained parallel source — CI gate layer-applicability-drift enforces it (c2b).
  • audit-harness fp-rate (scripts/fp-rate.py): measures each gate's false-positive / false-negative rate over a labeled corpus (tests/fixtures/conform/{valid,malformed}/). This is the metric that gates advisory→blocking promotion. --max-fp-rate X exits 1 if any gate exceeds the bar; CI runs it advisory at the 5% default bar (c2e).
  • docs/gate-promotion.md: the dedicated advisory→blocking promotion rule — FP-rate ≤ 5% bar, engineer-pinned in tests/TESTING.md, re-pinned manifest. Documents why FP-rate (not FN-rate) is the gate and how demotion/kill-switch works. docs/ now ships in the npm package (files).

Added — conform verb + bundled content-addressed schemas (PP-PLAN-040 Phase 2)

The second piece of the read-only brain: deterministic conformance, emitting Evidence Bundle rows.

  • audit-harness conform [repo] (scripts/conform.py, stdlib + PyYAML): read-only conformance gate-runner. For every dimension: conformance gate in the repo's audit-profile/v1, it locates the artifact(s) and emits a gate-result/v1 row (JSON array, stdout). Never writes, never live-fetches.
  • Bundled content-addressed schemas (schemas/conform/v1/): skillmd-frontmatter, mcp-config, plugin-manifest, agent-frontmatter — the deterministic structural floor (parses + required keys + types), distinct from the IS 100-point rubric / SAK authoring kernel (judgment, stays in /validate-*). conform records each schema's sha256 in the row's policy_hash, so a row re-verifies against the exact schema version that produced it.
  • Reproducible-by-design engine. Bundled JSON-Schemas are checked by an embedded subset validator (complete for the closed bundled schemas) rather than ajv — deliberately, because ajv's availability/version varies per machine and would make signed evidence non-reproducible. Same commit + same harness version produce an identical verdict.
  • Genuinely-external formats shell out: OpenAPI to spectral, GitHub Action to yamllint. Missing tool produces an ADVISORY indeterminate (never a false FAIL).
  • Advisory-first. A conformance violation on an enforcement: advisory gate is ADVISORY (severity error), exit 0 — logged, not blocking. --strict (or an engineer-promoted enforcement: blocking gate) turns a violation into FAIL (exit 1). Missing artifact produces NOT_APPLICABLE. Kill-switch (AUDIT_HARNESS_DISABLE=1 / .audit-harness.yml) produces an empty [], exit 0.
  • tests/conform/: golden suite (31 checks) + pass/fail fixtures (valid + malformed SKILL.md, .mcp.json, plugin manifest, agent) — asserts valid to PASS, malformed to ADVISORY (default) / FAIL (--strict), every row validates against gate-result/v1, the NOT_APPLICABLE + indeterminate paths, and policy_hash == bundled-schema sha256 + reproducible. Wired into CI (conform job).

Scope boundary: conformance kinds without a bundled schema (marketplace, hook) resolve to ADVISORY indeterminate — drop a schema into schemas/conform/v1/ to light them up, no code change. No gate execution for testing-depth/security yet (Phase 3+).

Added — classify verb + audit-profile/v1 (PP-PLAN-040 Phase 0 + Phase 1)

The first piece of the "comprehensive audit, on any repo" build: the read-only brain.

  • audit-profile/v1 schema (schemas/audit-profile/v1.schema.json): closed, versioned, hash-bearing value mirroring gate-result/v1. Four invariants: classifications are a UNION (not a winner), unresolved[] is the only Claude-refinable surface, waived ⇒ disabled (allOf-enforced), registry_hash makes a profile reproducible.
  • Canonical dimension→gate registry (schemas/audit-profile/registry.v1.json): the single datum that answers "which gates apply to repo-type X, in which dimension, at what applicability" — layer-applicability.md and TESTING.md become projections of it.
  • audit-harness classify [repo] (scripts/classify.py, stdlib-only): read-only repository classifier. Detects the UNION of repo-type + Claude-artifact classifications, resolves the gate set against the registry, records registry_hash, and emits an audit-profile/v1 value to stdout. Never writes to the repo.
  • Safety levers: INDETERMINATE result class (infra failure ≠ policy failure); dispatcher per-command supervision via AUDIT_HARNESS_TIMEOUT (kill a hung gate, exit 124); AUDIT_HARNESS_DISABLE=1 kill-switch (gate commands no-op; classify emits an all-disabled profile); engineer-owned .audit-harness.yml override (classify_pins, advisory, disable_gates, disable) — see .audit-harness.example.yml.
  • tests/classify/: golden fixture corpus (6 fixtures, authored before the classifier) + suite — golden-matches classifications, schema-validates every profile, exercises the kill-switch, the unknown/unresolved path, and override honoring. Wired into CI (classify job).
  • schemas/ now ships in the npm package (files) so the registry + schema are available to consumers on any repo.

Scope boundary: no conform verb, no gate execution yet (Phase 2+). classify is read-only and emits a profile only.

[v1.1.5] - 2026-06-03

Added — npm release pipeline (closes the publish-pipeline gap)

This is the first release published to npm via CI with Sigstore provenance. Until now the repo had no release workflow — npm was stuck at 0.1.0 while the code (and every other manifest) had advanced through 1.0.01.1.4, four minors of CHANGELOG-documented work that never reached consumers. npm install @intentsolutions/audit-harness resolved to the stale 0.1.0 tarball.

  • .github/workflows/release.yml (NEW): mirrors the provenance approach of intent-eval-core's release workflow, adapted for this zero-dependency polyglot CLI (no pnpm, no lockfile, no TS build, no coverage). Triggers on push of a v*.*.* tag and on workflow_dispatch. Sets id-token: write for npm/Sigstore OIDC. Verifies the pushed tag matches package.json#version (skipped on manual dispatch since there's no tag), runs the node bin/audit-harness.js --version self-check + the repo's escape-scan.sh --staged test script (non-blocking on no-staged-diff), then npm publish --provenance --access public. The NPM_TOKEN repo secret is already configured.

Fixed — package metadata + install.sh URLs for the intent-audit-harness repo rename

The GitHub repo was renamed audit-harnessintent-audit-harness, but the metadata still pointed at the old path.

  • package.json: homepage, repository.url, and bugs.url repointed from jeremylongshore/audit-harnessjeremylongshore/intent-audit-harness (these render on npmjs.com).
  • python/pyproject.toml + rust/Cargo.toml: project-URL fields (Homepage / Repository / Issues / Changelog / documentation) repointed to the renamed repo — these render on PyPI and crates.io.
  • python/src/intent_audit_harness/__init__.py: docstring source-link repointed.
  • README.md: the curl … install.sh line + the two "Related" skill links repointed to the renamed repo.
  • install.sh: the REPO= variable, the usage-comment URLs at the top, and the re-run hint repointed; the default VERSION bumped from the stale v0.1.0v1.1.5.

Fixed — install.sh tarball-path glob broke after the rename

The GitHub archive tarball unpacks as <repo>-<version>/, which became intent-audit-harness-1.1.5/ after the rename. The unpack-dir detection used find … -name 'audit-harness-*', and -name matches the basename with no implicit leading wildcard, so it matched nothing under the new prefix — every vendored install would have failed at "could not find unpacked dir". Changed the glob to -name '*audit-harness-*' (leading wildcard), which matches both the current intent-audit-harness-* name and legacy audit-harness-* tags. Verified against both directory names.

Added — README badge row

npm-version, License Apache-2.0, and Sigstore-provenance shields under the H1 (mirrors the intent-eval-core badge row). The "Part of the Intent Eval Platform" cross-link line is preserved.

Changed — Version bumped to v1.1.5 across all manifests

Per the version-canonical-check CI gate (v1.0.2 PR #35). package.json (canonical), version.txt, python/pyproject.toml, python/src/intent_audit_harness/__init__.py, and rust/Cargo.toml all report 1.1.5. (rust/Cargo.lock is gitignored; its working-tree entry is aligned for local cargo builds.)

Why patch, not minor

No new CLI commands, no new flags, no API change, no script behavior change. This is release-engineering + metadata: the publish pipeline that ships the existing 1.1.x code, plus URL corrections for the repo rename, plus the install.sh glob fix. The pinned policy scripts (.harness-hash) are untouched.

Verification

  • npm pack --dry-run → tarball contains bin/, scripts/, README.md, LICENSE, NOTICE, CHANGELOG.md per package.json#files
  • node bin/audit-harness.js --version1.1.5
  • bash -n install.sh → exit 0; unpack-dir glob matches intent-audit-harness-1.1.5 (and legacy audit-harness-*)
  • bash scripts/harness-hash.sh --verify → OK (no pinned files changed)

[v1.1.4] - 2026-05-25

Fixed — gherkin-lint.sh prev_blank print-every-line noise (IEP P3, Gemini #71 review chain)

Closes iah-gherkin-prev-blank-noise (bd_000-projects-o9q1, P2). The third awk block in scripts/gherkin-lint.sh (the And-at-scenario-start checker) opened with a bare prev_blank = 1 expression that awk interpreted as an always-true pattern with implicit { print } default action — flooding stdout with every line of every feature file alongside the intentional ERROR printf. prev_blank was never USED anywhere in the awk script (verified via grep). Removed both touches: the top-level expression AND the assignment in the blank-line pattern (which was also unreachable for anything that mattered, since no downstream pattern read prev_blank). The third awk block now produces ONLY the targeted ERROR line when triggered. Verified via the same deliberate-failure test from v1.1.2 AAR — output before: full feature file printed interleaved with ERROR. Output after: just the ERROR line.

Changed — gherkin-lint.sh process_awk_output() collapsed to single awk pass (Gemini #38 follow-up)

Closes iah-gherkin-single-awk-opt (bd_000-projects-vawm, P3). v1.1.2 introduced process_awk_output() with two awk subprocesses per call (one counting WARN, one counting ERROR). v1.1.4 collapses to a single awk pass via read -r w e < <(awk '/^WARN /{w++} /^ERROR /{e++} END {print w+0, e+0}' <<< "$out") per Gemini PR #39 verbatim suggestion. Halves the awk fork count (4 callsites × 2 subprocesses = 8 awk processes/feature → 4). Verified with mixed WARN+ERROR test: 2 WARNs + 1 ERROR in one feature file produces summary 2 warning(s), 1 error(s) and exit 1.

Fixed — crap-score.py exclusion sets deduplicated via EXCLUDED_DIRS constant (Gemini #71 review)

Closes iah-crap-score-exclusion-dedup (bd_000-projects-niv8, P2). Pre-v1.1.4, scripts/crap-score.py had TWO separate sets with overlapping intent but divergent contents:

  • ignore set in score_python() (line 85): had "reports" but lacked .next, .nuxt, .cache
  • prune set in main() (line 394, added v1.1.1 for --json input-hash walk): had .next, .nuxt, .cache but lacked "reports"

Asymmetry was a real bug: a repo with reports/ would skip score_python's candidate scan but its .py files DID get hashed by the input-hash walk; opposite for .next/.nuxt/.cache. Fixed by extracting a single module-level constant EXCLUDED_DIRS (union of both prior sets) referenced by both call sites. Set contents: .git, .venv, venv, node_modules, __pycache__, dist, build, target, .tox, .mypy_cache, .pytest_cache, .next, .nuxt, .cache, reports.

Changed — Shellcheck CI job version-pinned (parity with ruff v1.1.3)

Closes iah-shellcheck-version-pin (bd_000-projects-v1ds, P3). v1.1.2 (Phase A1) installed shellcheck via apt-get install -y shellcheck which pulls whatever Ubuntu's runner-image version happens to ship (currently 0.9.0). When the runner image upgrades shellcheck to 0.10.x or later, new rules activate silently and could surface findings in already-merged code. v1.1.4 pins to v0.10.0 via download from the koalaman/shellcheck GitHub releases. CI step prints shellcheck --version for audit trail. To bump: edit SHELLCHECK_VERSION env in the workflow + run shellcheck scripts/*.sh locally + commit as explicit PR. Matches the ruff version-pin pattern from v1.1.3.

Changed — Version bumped to v1.1.4 across all 5 manifests

Per the version-canonical-check CI gate (v1.0.2 PR #35). All 5 manifest locations now report 1.1.4.

Changed — .harness-hash regenerated

scripts/gherkin-lint.sh + scripts/crap-score.py modified; both are pinned. 2 of 9 pinned-file hashes change.

Why patch, not minor

Pure cleanup release: dead-code removal, perf microoptimization, bug fixes for cross-call inconsistencies, CI version pin. No new CLI commands, no new flags, no API change. Consumers re-vendor / pnpm up and get the cleaner scripts + tighter CI transparently.

Verification

  • shellcheck scripts/*.sh → exit 0 (local 0.9.0; CI will run pinned 0.10.0)
  • ruff checkAll checks passed!
  • bash -n scripts/*.sh → all pass
  • python3 -m py_compile scripts/crap-score.py + cli.py → exit 0
  • bash scripts/harness-hash.sh --verify → OK after --init
  • gherkin-lint deliberate-failure test (And-at-start): exit 1, summary correct
  • gherkin-lint mixed test (2 WARN + 1 ERROR): summary 2 warning(s), 1 error(s), exit 1
  • Output noise gone: feature-file lines no longer printed alongside ERRORs

AAR: 000-docs/009-AA-AACR-v1.1.4-cleanup-bundle-2026-05-25.md.

Not bundled (separate scope)

iah-python-wrapper-scripts-sync (bd_000-projects-65k4) remains open. The Python wrapper's python/src/intent_audit_harness/scripts/crap-score.py (and the Rust wrapper's mirror) are stale by design — install.sh sources from canonical scripts/ but wrapper packaging hasn't grown a build-time sync mechanism. Implementation requires choosing between hatch build-hook, Cargo build.rs, symlinks, or CI-enforced manual sync. Deferred to its own focused PR.

[v1.1.3] - 2026-05-24

Added — Ruff CI gate against own-code Python (IEP Convergence Debt Plan Priority 6 Phase A2)

Closes iah-ruff (bd_000-projects-x9bs, P1). New .github/workflows/ci.yml job ruff (Python lint) runs ruff check (version-pinned to 0.15.4 per the iah-shellcheck-version-pin lesson) against the own-code Python surface. Ruleset select = ["B", "E", "F"] — pyflakes (F) for dead imports + unused variables; pycodestyle errors (E) for syntax-level issues; flake8-bugbear (B) for Python-specific bugs (mutable default args, unreliable exception handling — added per Gemini PR #39 review after empirical confirmation that zero new findings fire on our codebase). Line length set to 120 (modern Python convention). Further ratchet (I import-order, UP pyupgrade, etc.) deferred to a future ratchet bead.

  • New ruff.toml at repo root: lint scope = scripts/*.py + python/src/intent_audit_harness/{__init__,__main__,cli}.py; excludes python/.venv/ + python/src/intent_audit_harness/scripts/ + rust/scripts/ (the last two are bundled-content mirrors of scripts/* — stale-sync tracked separately, see below).
  • Version pinned via pip install 'ruff==0.15.4'; CI prints ruff --version for audit trail.

Removed — 3 ruff-surfaced dead-code findings

  • scripts/crap-score.py: redundant local import hashlib, os inside the if args.json: block was shadowing the module-level import os, causing ruff F401 against the top-level (which IS used by the same block). Per Gemini PR #39 review (PEP 8 alignment), moved hashlib to module-level imports alongside the other stdlib imports; removed the local re-import entirely. The bandaid-comment explaining the local import is also gone.
  • scripts/crap-score.py: dead local variable metrics = rec.get("metrics", {}).get("cyclomatic", {}) in score_rust() (line 266; F841). Assigned but never read. The actual cyclomatic value is fetched freshly inside the loop on line 268.
  • python/src/intent_audit_harness/cli.py: dead import os at line 12 (F401). Zero os.* usages in the file.

Changed — Long-line reformat in scripts/crap-score.py

  • Line 84 ignore set literal (155 chars) reformatted into a multi-line set literal that fits 120-char limit. Cosmetic; no behavior change.

Changed — Version bumped to v1.1.3 across all 5 manifests

Per the version-canonical-check CI gate (v1.0.2 PR #35). All 5 manifest locations now report 1.1.3.

Changed — .harness-hash regenerated

scripts/crap-score.py is pinned by .harness-hash-extra-patterns; the dead-code removal + long-line reformat changes its hash. 1 of 9 pinned-file hashes change.

Why patch, not minor

Pure lint-gate addition + dead-code removal. No new CLI commands, no new flags, no API change. Consumers re-vendor / pnpm up and get the cleaner scripts + the (new for them) ruff config transparently.

Verification

  • ruff checkAll checks passed! on clean checkout
  • python3 -m py_compile scripts/crap-score.py → exit 0
  • python3 -m py_compile python/src/intent_audit_harness/cli.py → exit 0
  • shellcheck scripts/*.sh → exit 0 (no regression on Phase A1)
  • bash scripts/harness-hash.sh --verify → OK after --init
  • CI ruff job will block any future PR that introduces a Python lint finding (F401, F841, E*, etc.)

Follow-up bead filed

iah-python-wrapper-scripts-sync (new) — python/src/intent_audit_harness/scripts/crap-score.py is a stale mirror of scripts/crap-score.py, ~1 month behind canonical source. Missing the v1.1.1 --json envelope emission, the which_or_none("go") PATH guard, and the rglob-walk pruning. Same pattern likely in rust/scripts/. Either (a) build-time copy in the Python/Rust wrapper packaging, (b) symlink, or (c) hand-sync discipline with CI check. Currently excluded from ruff scope; exclusion drops once the sync mechanism ships.

AAR: 000-docs/008-AA-AACR-ruff-iep-P6-2026-05-24.md.

What unblocks next

P6 Phase A2 complete. Next-ready P6 work:

  • A3: iah-eslint-dispatcher (bd_000-projects-rnpy) — eslint coverage for bin/audit-harness.js
  • B1: iep-shared-lint-configs.audit-harness-configs/ for vendoring lint configs to consumer repos
  • Plus 2 bundleable Gemini-found fixes from v1.1.2 review: iah-gherkin-prev-blank-noise + iah-gherkin-single-awk-opt

[v1.1.2] - 2026-05-24

Changed — Shellcheck CI gate flipped from tolerant to hard-fail (IEP Convergence Debt Plan Priority 6 Phase A1)

Closes iah-shellcheck-hard-fail (bd_000-projects-4asc, P1). The shellcheck job in .github/workflows/ci.yml previously ran shellcheck scripts/*.sh || true — warnings and errors were logged but never blocked the PR. As of this release the || true suffix is removed: any shellcheck finding (warning or error) blocks the build. The locked precondition was v1.1.1 (PR #37) which addressed the 6 Gemini-flagged robustness findings — the surface was already clean enough that flipping the gate exposed exactly 3 residual dead-code findings, all fixed below.

Removed — 3 pieces of dead code surfaced by the harder shellcheck gate

  • scripts/bias-count.sh: declare -A PATTERN_COUNTS plus the per-call PATTERN_COUNTS["$label"]=$count assignment in count_pattern(). SC2034: the associative array was populated but never read. Per-pattern counts are still printed inline (line 61) and are aggregated into TOTAL_BIAS for the JSON output bias_total metadata field; the per-pattern breakdown was apparently intended for a richer JSON shape that was never wired. Restoring it would be a feature, not a fix; filed as deferred scope if a consumer asks.
  • scripts/emit-evidence.sh: INPUT_HASH_HEX="$(echo "$STATEMENT" | python3 -c ...)" (formerly line 238). SC2034: computed but never read. Vestige from an earlier cosign integration; the surrounding BLOB_FILE construction relies on ARTIFACT_NAME only.
  • scripts/gherkin-lint.sh: err() helper function. SC2317: zero call sites in the file (verified via grep -n "\berr\b" — only the definition matches). The helper was defined symmetrically with warn() but never wired up to the awk rubric or the subprocess-fallback path. Replaced with process_awk_output() helper (see Fixed section below).

Fixed — gherkin-lint.sh awk subprocess undercount (silent-failure class bug; Gemini PR #38 review)

While processing the SC2317 cleanup above, Gemini's PR #38 review surfaced a deeper bug: the gherkin-lint.sh awk-fallback path printed WARN/ERROR lines via awk printf but those subprocesses never incremented the parent shell's WARN_COUNT/ERROR_COUNT counters. The summary line said "0 warnings, 0 errors" while errors were actively being printed; the exit code stayed 0 regardless. Exactly the silent-failure class the linter exists to surface in OTHER projects.

  • New process_awk_output() helper: wraps each awk subprocess, captures its output, counts WARN / ERROR lines via inline awk ('/^WARN /{c++} END{print c+0}' — set-euo-pipefail safe, no || true needed), increments the bash counters, then re-prints. 4 awk blocks now feed through it.
  • Verification: deliberate-failure test against a feature with Scenario: ... \n And ... produces exit code 1 + summary 0 warning(s), 1 error(s) (was: exit 0 + 0 warning(s), 0 error(s) while still printing the ERROR line). Clean feature still exits 0.
  • Separate-scope finding: the third awk script contains a stray top-level prev_blank = 1 that awk treats as an always-true pattern, triggering its default print-every-line action. That's a pre-existing cosmetic issue (extra noise in script output) but not a counter bug — filed as deferred scope.

Changed — Version bumped to v1.1.2 across all 5 manifests

Per the version-canonical-check CI gate (v1.0.2 PR #35). All 5 committed manifest locations now report 1.1.2:

  • package.json
  • version.txt
  • python/pyproject.toml
  • python/src/intent_audit_harness/__init__.py
  • rust/Cargo.toml

Changed — .harness-hash regenerated

The self-pinning manifest is regenerated to capture the new script hashes (per iep-P3 iah-self-pin v1.1.0 mechanism). 3 of 9 pinned-file hashes change (the 3 modified scripts); 6 unchanged.

Why patch, not minor

Pure dead-code removal + a CI policy tightening. No new CLI commands, no new flags, no API change, no behavioral change for any consumer. Downstream consumers re-vendor (or pnpm up) and get the cleaner scripts transparently.

Verification

  • shellcheck scripts/*.sh → exit 0 on a clean checkout (verified locally before push)
  • bash -n scripts/*.sh → all pass
  • python3 -m py_compile scripts/crap-score.py → exit 0
  • bash scripts/harness-hash.sh --verify → harness-hash: OK after --init
  • CI shellcheck job will now block on any future warning — try staging cmd $var (unquoted expansion) to verify the gate fires

AAR: 000-docs/007-AA-AACR-shellcheck-hard-fail-iep-P6-2026-05-24.md.

What this unblocks in the IEP Convergence Debt Plan

P6 Phase A1 closed. Next-ready P6 work:

  • A2: iah-ruff — add Python ruff CI gate
  • A3: iah-eslint-dispatcher — add eslint coverage for bin/audit-harness.js
  • A4: iah-script-robustness-upstream (already shipped in v1.1.1; nothing more to do)

[v1.1.1] - 2026-05-23

Fixed — 6 script robustness + portability fixes (IEP Convergence Debt Plan Priority 3)

Closes iah-script-robustness-upstream (bd_000-projects-qqkq, P2). Addresses the 6 medium-severity Gemini findings surfaced when audit-harness scripts were vendored into intent-eval-lab via iep-harness-hash-platform-rollout (PR #67). All fixes are upstream-only: zero CLI surface change, zero runtime-dep change, zero policy change.

  • scripts/escape-scan.sh (mktemp leak): --staged and --range modes allocate a temp file via mktemp to capture the diff but never clean it up. Adds trap 'rm -f "$DIFF_SRC"' EXIT immediately after each mktemp so the temp file is removed on every exit path (clean exit, REFUSE, CHALLENGE, signal). Matters most when escape-scan runs as a local git hook where temp accumulation is silent.
  • scripts/crap-score.py (missing go PATH guard): score_go() called run(["go", "test", "-coverprofile=...", ...]) without first checking that go is on PATH, so on systems without Go installed the subprocess raised FileNotFoundError and aborted the whole CRAP pass. Wraps the call in the existing which_or_none("go") pattern already used for radon, gocyclo, and the downstream go tool cover invocation.
  • scripts/crap-score.py (rglob walk pruning): the --json input-hash computation walked every file under root via rglob("*"), only filtering node_modules / .venv after the directory had been traversed. Replaces with os.walk + dirs[:] = [...] in-place pruning, skipping .git, node_modules, .venv/venv, __pycache__, dist, build, target, .tox, .mypy_cache, .pytest_cache, .next, .nuxt, .cache. Major perf win on large repos; no behavioral change to the resulting hash for repos without pruned-extension files under those directories.
  • scripts/emit-evidence.sh (shell→Python path injection): python3 -c "import json, sys; print(json.load(open('$PKG_JSON'))['version'])" interpolated the shell variable directly into the Python source. Paths containing single quotes (or arbitrary characters in adversarial cases) broke the parse. Now passes $PKG_JSON via sys.argv[1]python3 -c "import json, sys; print(json.load(open(sys.argv[1]))['version'])" "$PKG_JSON" — moving the path through the safe argv channel.
  • scripts/bias-count.sh (per-file sha256sum fork): find ... -exec sha256sum {} \; spawned one sha256sum process per matched file. Changes the terminator to + so find batches arguments into one (or few) sha256sum invocations. Perf win on test suites with many files; output identical because the downstream sort | sha256sum step normalizes.
  • scripts/harness-hash.sh (cross-platform sha256sum): GNU coreutils ships sha256sum, macOS ships shasum -a 256. Adds detection at script top selecting whichever is available into a SHA256_CMD bash array, falling back with a clear error if neither is on PATH. Both produce identical <hash> <file> output, so the manifest format and downstream awk parsing are byte-equivalent. Enables engineer-local runs on macOS without forcing every contributor to install coreutils.

Changed — Version bumped to v1.1.1 across all 5 manifests

Per the version-canonical-check CI gate (added in v1.0.2 PR #35). All 5 committed manifest locations now report 1.1.1:

  • package.json
  • version.txt
  • python/pyproject.toml
  • python/src/intent_audit_harness/__init__.py
  • rust/Cargo.toml

Changed — .harness-hash regenerated

The self-pinning manifest is regenerated to capture the new script hashes (per iep-P3 iah-self-pin v1.1.0 mechanism). The 6 script edits change 4 of the 9 pinned-file hashes; --init rewrites the manifest.

Why patch, not minor

Pure bug + portability fixes. No new flags, no new commands, no policy change, no breaking change to the manifest format. Downstream consumers re-vendor (or re-install via the polyglot installers) and get the improvements transparently.

Why this matters for the platform

The scripts in this release are now vendored into intent-eval-lab (per iep-harness-hash-platform-rollout rollout 1, lab PR #67) and will land in j-rig-binary-eval next. Bug-fix patches travel via re-vendor — AUDIT_HARNESS_VERSION=v1.1.1 curl -sSL https://raw.githubusercontent.com/jeremylongshore/audit-harness/main/install.sh | bash for vendored consumers, pnpm up @intentsolutions/audit-harness for node consumers. Landing the fixes before the rollout reaches more repos avoids re-publishing buggy vendored copies that immediately need replacement.

AAR: 000-docs/006-AA-AACR-script-robustness-upstream-iep-P3-2026-05-23.md.

Sequencing impact on Priority 6 Phase A1

Priority 6 Phase A1 (iah-shellcheck-hard-fail) flips .github/workflows/ci.yml:89 from shellcheck scripts/*.sh || true to hard-fail. Per the IEP Convergence Debt Plan risk-mitigation table ("Flipping shellcheck to hard-fail breaks existing audit-harness CI — mitigation: land fixes for Gemini's 6 findings FIRST, THEN flip the gate"), this release is the explicit precondition for the shellcheck flip. Phase A1 PR opens after v1.1.1 lands on main.

[v1.1.0] - 2026-05-22

Added — Per-repo .harness-hash-extra-patterns mechanism + audit-harness self-pin (IEP Convergence Debt Plan Priority 3)

Closes iah-self-pin (bd_000-projects-itpl, P1). The harness's own policy enforcement surface (scripts/.sh + scripts/.py + bin/audit-harness.js) is now hash-pinned at the audit-harness repo root. CI's audit-harness list + harness-hash --verify self-check steps are flipped from || true exit-3 tolerance to hard-fail: any byte change to a pinned policy file without a fresh --init + commit of the regenerated .harness-hash exits 2 (HARNESS_TAMPERED) and blocks the PR.

  • scripts/harness-hash.sh: NEW — reads an optional .harness-hash-extra-patterns file at the repo root and appends its lines to the default PATTERNS array. Comments (#) + blank lines ignored. Backward-compatible: repos without the file get exactly the previous behavior — consumer repos are not affected.
  • .harness-hash-extra-patterns (NEW, audit-harness repo root): pins scripts/*.sh, scripts/*.py, bin/audit-harness.js, and the extras file itself (preventing silent edits to the self-pinning scope).
  • .harness-hash (NEW, audit-harness repo root): 9-file manifest produced by bash scripts/harness-hash.sh --init. Committed to main.
  • .github/workflows/ci.yml: audit-harness list + harness-hash --verify self-check steps drop || true suffixes. Hard-fail in place. Comment block updated.

Why minor not patch

The .harness-hash-extra-patterns mechanism is a new authored feature surface — repos that opt in get a new capability. Per SemVer, minor bump. Existing repos (zero adopters today; this is the first one) are unaffected.

Why this matters

Before this release, the audit-harness CI workflow could not enforce its own policy. The "harness tests itself" design rule (CLAUDE.md rule 5) was aspirational — audit-harness list and harness-hash --verify both exited 0 when no manifest existed (intentional tolerance to avoid false-failing every PR). A silent edit to scripts/escape-scan.sh (the gate that REFUSES threshold-lowering changes) would pass CI. That's the failure mode this release closes.

Cross-platform-rollout note

iep-harness-hash-platform-rollout (bd_000-projects-g6zu) unblocks on this release. The remaining 4 IEP repos (intent-eval-lab, j-rig-binary-eval, intent-rollout-gate — kernel already pinned) can now copy this pattern using their own .harness-hash-extra-patterns to pin per-repo policy files (CI workflow definitions, governance docs, vendored harness wrappers).

Changed — Version bumped to v1.1.0 across all 5 manifests

Per the version-canonical-check CI gate landed in v1.0.2 (PR #35). All 5 committed manifest locations now report 1.1.0.

AAR: 000-docs/005-AA-AACR-iah-self-pin-iep-P3-2026-05-22.md.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment