Deterministic test-enforcement toolkit — hash-pinned policy, escape-scan refusal gates, CRAP scoring, architecture checks, and signed evidence emission, installed in the repo it governs.
A zero-runtime-dependency CLI (Node dispatcher + bash/Python scripts) that makes test-quality policy enforceable on fresh clones. Companion to the audit-tests and implement-tests Claude Code skills, and the deterministic-gates layer of the Intent Eval Platform — but usable standalone in any repo.
Links:
- GitHub: https://github.com/jeremylongshore/intent-audit-harness
- npm: https://www.npmjs.com/package/@intentsolutions/audit-harness
AI-assisted repositories need enforcement that travels with the code. A coding agent (or a hurried human) can lower a coverage threshold, delete a failing test, or relax an architecture rule in the same diff that adds a feature — and review does not reliably catch it. Quality gates that live on one developer's machine, or in a globally-installed skill, do not reproduce on a fresh clone, so the policy is unenforceable exactly where it matters: in someone else's CI.
Install the harness in the target repo and pin the policy. Engineer-owned files (tests/TESTING.md, feature files, architecture configs, mutation configs) are hashed into a committed .harness-hash manifest. Any diff that changes their content without a fresh, deliberate audit-harness init is caught at pre-commit and in CI and REFUSED. Around that core, the CLI ships deterministic gates (escape-scan, CRAP, architecture, bias, Gherkin lint) and read-only audit verbs (classify, conform, audit, scan, currency) that emit machine-readable gate-result/v1 rows suitable for signing as in-toto evidence.
The rule the harness enforces: policy changes must be conscious, not silent. AI agents stay useful — they can read policy and implement within constraints. What they cannot do is silently weaken the constraints.
| Question | Answer |
|---|---|
| Who | Teams running AI-assisted development that need test policy enforced in-repo; consumed by the Intent Eval Platform repos and the audit-tests / implement-tests Claude Code skills |
| What | A 16-command CLI: hash-pin (verify/init/list), escape-scan, arch, bias, gherkin-lint, crap, emit-evidence, plus read-only audit verbs classify, conform, audit, scan, fp-rate, currency, gen-layer-applicability |
| When | Pre-commit hooks (escape-scan + verify), CI on every PR (verify + range escape-scan), nightly self-check; audit verbs on demand |
| Where | Installed in the target repo via npm, PyPI, crates.io, or curl-vendored into .audit-harness/ for any other ecosystem |
| Why | Enforcement must travel with the code — fresh clones reproduce every gate, and thresholds are read from the repo's own tests/TESTING.md, never hardcoded |
| Layer | Technology |
|---|---|
| CLI dispatcher | Node ≥ 18, zero runtime dependencies (bin/audit-harness.js, thin spawn wrapper) |
| Gate scripts | Bash + Python 3 stdlib (scripts/*.sh, scripts/*.py) — the canonical implementations |
| Schemas | JSON Schema: audit-profile/v1 + canonical gate registry, conform/v1 content-addressed artifact schemas, currency pin relation |
| Evidence | in-toto Statement v1 envelope; gate-result/v1 predicate (canonical schema in @intentsolutions/core) |
| Distribution | npm (@intentsolutions/audit-harness) + PyPI (intent-audit-harness) + crates.io (intent-audit-harness) + install.sh vendoring |
| CI | GitHub Actions — 3-Node matrix (18/20/22), pinned shellcheck v0.10.0 + ruff 0.15.4, nightly cron |
| Release | Tag-triggered release.yml, npm publish --provenance (sigstore, OIDC) |
- Refuse-at-pre-commit escape detection.
escape-scanclassifies diff hunks against an escape grammar: exit 0 clean, exit 1 CHALLENGE (requires an engineer-approved comment), exit 2 REFUSE (pipeline halted). Threshold-lowering, test deletion, and architecture-rule bypass are caught before they merge. - Hash-pinned policy. Pinned files changed without re-init →
HARNESS_TAMPERED(exit 2) blocks the PR. The harness self-pins its own scripts and dispatcher via.harness-hash-extra-patterns, so a silent edit to the gate that does the refusing is itself refused. - Golden-master suites. The raw stdout of
gherkin-lintandcrap-score --jsonis pinned byte-stable against a deliberate-failure corpus — a silent reshape of scorer output (renamed field, dropped WARN line) fails CI even when the per-row schema gate cannot see it. - Reproducible evidence by design.
conformvalidates artifacts against schemas bundled in the harness (never live-fetched), records each schema's sha256 in the row'spolicy_hash, and uses an embedded subset validator instead of ajv so the same commit + same harness version always produce an identical verdict. - Read-only audit verbs.
classify/conform/audit/scan/currencynever write to the target repo, andauditdeliberately does not execute the repo's test suite — it reports test-layer presence; execution verdicts stay in the repo's own CI. - Safety levers.
AUDIT_HARNESS_DISABLE=1kill-switch,AUDIT_HARNESS_TIMEOUTper-command supervision (hung gate killed, exit 124), and an engineer-owned.audit-harness.ymlper-repo override.
The design rule is explicit: scripts are the source of truth. bin/audit-harness.js is a ~200-line dispatcher that resolves a command name to a canonical bash or Python implementation in scripts/ and spawns it with inherited stdio. There is no TypeScript port and no npm dependency — dependencies and devDependencies are empty, and the published tarball contains only bin/, scripts/, schemas/, docs/, and the license/changelog files. This is what makes the same CLI surface shippable identically through npm, PyPI, crates.io, and a curl-vendored .audit-harness/ tree.
Command surface (16 commands):
| Command | Script | Purpose |
|---|---|---|
verify / init / list |
harness-hash.sh |
Verify / (re-)pin / enumerate the hash-pinned policy manifest |
escape-scan |
escape-scan.sh |
Scan a diff (--staged, --range A..B, stdin, patch file) for escape attempts |
arch |
arch-check.sh |
Language-appropriate architecture-rule checker (dependency-cruiser / import-linter / ArchUnit / deptrac / arch-go) |
bias |
bias-count.sh |
Count test-bias patterns (tautology, smoke-only, …) |
gherkin-lint |
gherkin-lint.sh |
Advisory Gherkin quality check |
crap |
crap-score.py |
CRAP (complexity × coverage) scorer — Python, Go, JS/TS, Rust backends |
emit-evidence |
emit-evidence.sh |
Wrap a gate-result JSON envelope in an in-toto Statement v1 (predicate https://evals.intentsolutions.io/gate-result/v1) |
classify |
classify.py |
Read-only repo classifier → audit-profile/v1 (UNION of detected classifications + resolved gate set, registry_hash recorded) |
conform |
conform.py |
Read-only conformance gate-runner → gate-result/v1 rows against bundled content-addressed schemas |
audit |
audit.py |
Read-only testing-depth gate-runner — per-pyramid-layer presence heuristics; --deep adds crap-score |
scan |
scan.py |
Read-only security/hygiene/skill-quality gate-runner — shells out to gitleaks / osv-scanner / semgrep / syft / markdownlint / lychee; missing tool → ADVISORY indeterminate, never a false FAIL |
fp-rate |
fp-rate.py |
Measures per-gate false-positive/false-negative rate over a labeled corpus — the metric gating advisory→blocking promotion (≤ 5% bar) |
currency |
currency.py |
Advisory upstream-currency report from a per-upstream pin relation; no exit-code authority (always exit 0), no live-fetch |
gen-layer-applicability |
gen-layer-applicability.py |
Projects the canonical gate registry into layer-applicability.md; --check fails CI on drift |
All gates support --json for machine-readable gate-result envelopes pipeable to emit-evidence. The schemas/ directory ships in the package: audit-profile/ (the v1 schema, the canonical registry.v1.json gate registry, and its generated projection), conform/v1/ (skillmd-frontmatter, mcp-config, plugin-manifest, agent-frontmatter), and currency/pins.v1.json.
Exit codes (CI contract):
| Exit | Command | Meaning |
|---|---|---|
| 0 | any | Clean |
| 1 | escape-scan | CHALLENGE — engineer-approved comment required |
| 2 | verify | HARNESS_TAMPERED — pinned file changed without re-init |
| 2 | escape-scan | REFUSE — pipeline halted |
| 3 | verify | No manifest (fresh repo, not an error) |
| 124 | any gate | Killed by AUDIT_HARNESS_TIMEOUT supervision (INDETERMINATE) |
Within the 7-layer testing taxonomy the harness serves L1 (escape-scan in pre-commit + CI — the cheapest gate) and L3 (CRAP, architecture, bias, hash-pin — code-level correctness).
Install — pick the ecosystem:
# Node / JS / TS
pnpm add -D @intentsolutions/audit-harness # or npm install --save-dev / yarn add --dev
# Python
pip install intent-audit-harness
# Rust
cargo install intent-audit-harness
# Anything else (Go, Ruby, PHP, Java, .NET, shell) — vendor the scripts
curl -sSL https://raw.githubusercontent.com/jeremylongshore/intent-audit-harness/main/install.sh | bashThe vendored path copies the scripts, the Node dispatcher, NOTICE (Apache-2.0 § 4(d) — it must travel with any distribution), and a PROVENANCE file recording source repo, version, tarball URL, and install timestamp.
Pre-commit hook:
#!/usr/bin/env sh
pnpm exec audit-harness escape-scan --staged
pnpm exec audit-harness verifyCI step:
- run: pnpm exec audit-harness verify
- run: pnpm exec audit-harness escape-scan --range origin/main..HEADChanging a policy threshold (the conscious path):
# 1. Edit tests/TESTING.md (e.g., coverage.line 80 → 75)
# 2. Re-init to accept the change
pnpm exec audit-harness init
# 3. Commit the regenerated manifest alongside the policy change
git add tests/TESTING.md .harness-hash
git commit -m "chore(test): lower coverage floor to 75"Language support:
| Language | CRAP | Arch |
|---|---|---|
| Python | radon + coverage.py | import-linter |
| JS/TS | complexity-report + c8 | dependency-cruiser |
| Go | gocyclo + go test -cover | arch-go |
| Rust | rust-code-analysis + tarpaulin | (custom) |
| Java/Kotlin / .NET / PHP | — | ArchUnit / ArchUnitNET / deptrac |
- License + distribution integrity. Apache-2.0 since v1.0.0 (0.x releases remain MIT under their original terms). A
version-canonical-checkCI gate enforces that all polyglot manifests (package.jsoncanonical,version.txt,python/pyproject.toml,__init__.py,rust/Cargo.toml,Cargo.lock) agree on both version and Apache-2.0 license — drift fails the build. - Signed releases. npm publishing runs only through the tag-triggered
release.ymlwithid-token: writeandnpm publish --provenance(sigstore: Fulcio OIDC + Rekor). The workflow verifies the pushed tag matchespackage.json#versionbefore publishing. - Self-containment. The harness's own enforcement surface (
scripts/*.sh,scripts/*.py,bin/audit-harness.js, and the extra-patterns file itself) is hash-pinned at the repo root; CI hard-failsharness-hash --verifyon any byte change without a committed re-init. The harness tests itself: every PR's diff passes through its own escape-scan. - Pinned toolchain. shellcheck (v0.10.0) and ruff (0.15.4) are version-pinned in CI so a linter release cannot surface new findings silently against merged code.
- No network at gate time.
conformnever live-fetches schemas; verdicts re-verify against the exact bundled schema version viapolicy_hash.currency— the one verb whose subject is inherently network-bound (upstream freshness) — is deliberately advisory-only with no exit-code authority. - Demonstrated catch. The
scanverb's first run against this repo's own tree caught a PyPI publish token pasted as a literal in a docs file; HEAD was redacted in the same release and the rotation tracked separately (documented in the changelog — the gate worked on its own author). - Signed self-evidence. A CI-only emitter (
ci/, excluded from the published tarball to preserve the zero-dependency guarantee) runs the realharness-hash --verifyself-gate, shapes it into a kernelgate-result/v1+ Evidence Bundle, cosign-signs the canonical bytes, and publishes a manifest as a Release asset for downstream re-verification.
- v1.1.7 on npm with sigstore provenance; tags through
v1.1.7. GitHub repo renamedaudit-harness→intent-audit-harness(npm package name, CLI binary, and PyPI/crates package names unchanged). - 13 deterministic CI suites are required checks on
main. The CI workflow (ci.yml, push + PR + nightly cron) covers: dispatcher self-check across Node 18/20/22, shellcheck, ruff, Python compile-check, manifest version/license alignment, backward-compat regression, a SemVer contract suite (frozen subcommand roster, exit codes,--jsonstream contract, predicate URI), golden suites forclassify/conform/audit/scan/currency, a crap-score join-regression suite with stubbed providers, the golden-master stdout suite, layer-applicability projection drift, and advisory FP-rate measurement. - Fresh fixes (2026-06-11, PR #66): the Go and JS CRAP scorer backends were repaired —
kind='test'no longer passes a gocyclo ignore pattern that silently emptied the test CRAP gate; the go.mod module prefix is stripped fromgo tool coverpaths so coverage joins gocyclo's repo-relative complexity paths; absolute c8/istanbul coverage keys are normalized repo-relative to join complexity-report paths. The CI escape-scan self-check now captures the script's own exit code (previouslyteemasked it, making the REFUSE branch unreachable). A stub-driven regression suite (tests/crap-score/) fails pre-fix on all three findings without requiring gocyclo/go installs. Earlier in the cycle: evidence-integrity fixes + cross-platform SHA256 + kernel schema URL (PR #64) and the SemVer contract suite (PR #65). - Platform role: this repo is the deterministic-gates layer of the Intent Eval Platform — it emits
gate-result/v1rows into the shared Evidence Bundle schema (canonical contracts in@intentsolutions/core); downstream tooling consumes them.
Source: CHANGELOG.md (Keep a Changelog format), verbatim — the [Unreleased] section plus the six most recent headered releases. Releases v1.1.6 and v1.1.7 are tagged and published; their changes are recorded under [Unreleased] below pending the next header roll-up (last headered release: v1.1.5).
A fitness function that pins the raw stdout of the two scorers whose output is a downstream contract.
tests/golden/run-golden.shcapturesgherkin-lint.sh(text rubric) andcrap-score.py --json(gate-result envelope) stdout against atests/fixtures/deliberate-failure/corpus and diffs each against a checked-in golden, failing on any drift. Environment-volatile bytes are normalized out (gherkin-lint's installed-vs-awk-fallback first line; crap-score's absolutesummary_path) so the golden is byte-stable across machines. CI installs no complexity provider, so the crap golden captures the deterministic no-provider envelope shape.- Why this and not the per-row schema gate: the schema gate validates the augmented predicate that
emit-evidenceproduces, not the raw scorer stdout. A silent reshape of the scorer stdout — a renamed field, a dropped WARN line, changed summary wording — is a backward-compat break the schema gate cannot see. This suite is that missing guard. - Regenerate intentional changes with
bash tests/golden/run-golden.sh --updateand review the golden diff in the PR. Wired into.github/workflows/ci.ymlas thegoldenjob.
The vendored-install path (non-Node repos) now ships a complete, traceable copy.
NOTICEis copied into.audit-harness/— Apache-2.0 §4(d) requires the NOTICE file to travel with any distribution, and vendoring is a distribution.bin/audit-harness.js(the Node CLI dispatcher) andpackage.jsonare copied into.audit-harness/bin/+.audit-harness/so the canonical dispatcher surface is present and its--version(which reads../package.json) resolves in the vendored tree.- A
PROVENANCEfile records the source repo, version, tarball URL, and install timestamp so a vendored tree is traceable back to the exact release it came from.
The dashboard reports hub (labs.intentsolutions.io) ingests a signed report-manifest.json of kernel gate-result/v1 rows per repo. This adds audit-harness's own emit, lighting up its row.
ci/emit-evidence.ts+ci/assemble-manifest.ts— run the real deterministic self-gate (harness-hash --verify), shape it into a kernelgate-result/v1+EvidenceBundle(fail-closed against@intentsolutions/core), cosign-sign the canonical bytes (Fulcio OIDC + Rekor), and assemble the manifest the dashboard re-verifies at ingest.- Zero-dep guarantee preserved. The emitter lives in
ci/(excluded frompackage.json#files) and the kernel is installed CI-only vianpm i --no-save—dependencies+devDependenciesstay empty and the published tarball is unchanged (verified vianpm pack --dry-run). .github/workflows/release.yml— adds a GitHub Release on tag push + anemit-evidencejob (tag-only) that publishes the manifest as a Release asset.
The fifth verb, and deliberately the weakest: an advisory report with no exit-code authority.
audit-harness currency(scripts/currency.py, stdlib): reads the per-upstream-identity pin relation (schemas/currency/pins.v1.json) and reports which pins are themselves stale —checked_atolder than the pin's staleness window. Each upstream (mcp-spec, skill-md-schema, claude-code, gate-result-predicate, anthropic-sdk, agentskills-spec) carries its ownpinned_version+checked_at+ window, so the pin's own staleness is detectable (not one opaque scalar).- No exit-code authority (always exit 0), no live-fetch, no auto-fix. Currency depends on upstream state — non-deterministic and network-bound — so it only reports.
/sync-testing-harnessconsumes the report to open advisory bump PRs; it never reddens a build.--today YYYY-MM-DDmakes reports reproducible. tests/currency/: golden suite (3 checks) — stale/current/unknown classification, the no-exit-authority guarantee (exit 0 even when all pins are stale), and the shipped relation reporting.
The fourth read-only verb: security + hygiene + skill-quality, by orchestrating standard tools (never reimplementing them).
audit-harness scan [repo](scripts/scan.py, stdlib): for everydimension: security | hygiene | skill-qualitygate in the profile, emits agate-result/v1row. Three strategies: local (hygiene-readmeREADME presence — deterministic), shell-out (every gate carrying atool— gitleaks / osv-scanner / semgrep / syft / markdownlint / lychee — clean exit → PASS, findings → ADVISORY(error), tool absent → ADVISORY indeterminate), consume (skill-behavioralingests a j-rig Evidence Bundle verdict via--jrig-verdict; the harness never runs behavioral judgment itself — no verdict → indeterminate).- Advisory-first;
--strict(or a blocking gate) turns a finding/gap intoFAIL. Kill-switch →[]. Each row recordsmetadata.method(local-presence/shell-out/consume-j-rig). tests/scan/: golden suite (10 checks) with pinned-profile isolation so shell-out tool availability never makes the suite flaky.
Security note: on first run this gate caught — and this release redacts from HEAD — a PyPI publish token that had been pasted as a literal value in python/PUBLISH.md. The value remains in git history; it must be rotated at the registry (tracked separately). The doc now carries a placeholder.
The third read-only verb: the "finish the pyramid" testing-depth diagnostic.
audit-harness audit [repo](scripts/audit.py, stdlib): for everydimension: testing-depthgate in the profile, assesses the gate and emits agate-result/v1row. Two read-only strategies:crap-scoreruns the bundledcrapscorer (static complexity×coverage); every pyramid layer (unit/integration/e2e/smoke/perf/a11y/contract/migration/property-based/fuzz/sanitizers) gets a per-layer presence heuristic (test dirs, framework configs, dependency markers). Layer present →PASS; absent →ADVISORY(warn)testing-depth gap; not statically assessable →ADVISORYindeterminate.--fast(default) presence heuristics only (<10s);--deepaddscrap-score;--strictturns a gap on a blocking gate intoFAIL. Kill-switch →[]. Each row recordsmetadata.method(crap-static/presence-heuristic/delegated) for provenance.- Deliberately does NOT execute the repo's test suite. Running arbitrary untrusted suites is the repo's own CI's job; the harness reports coverage presence and the repo's CI test step produces the execution verdict.
auditis the diagnostic, not the test runner. tests/audit/: golden suite (7 checks) +has-tests/no-testsfixtures — asserts unit→PASS / gap→ADVISORY(default) / gap→FAIL(--strict), crap deep-only-in-fast, kill-switch, and gate-result/v1 validity. CIauditjob.
Closes the data/safety-spine epic (E2): the registry becomes the single canonical datum and gate promotion gets a measured bar.
audit-harness gen-layer-applicability(scripts/gen-layer-applicability.py): projectsschemas/audit-profile/registry.v1.jsonintoschemas/audit-profile/layer-applicability.md.--writeregenerates;--checkfails on drift. The doc is now a projection of the registry datum, not a hand-maintained parallel source — CI gatelayer-applicability-driftenforces it (c2b).audit-harness fp-rate(scripts/fp-rate.py): measures each gate's false-positive / false-negative rate over a labeled corpus (tests/fixtures/conform/{valid,malformed}/). This is the metric that gates advisory→blocking promotion.--max-fp-rate Xexits 1 if any gate exceeds the bar; CI runs it advisory at the 5% default bar (c2e).docs/gate-promotion.md: the dedicated advisory→blocking promotion rule — FP-rate ≤ 5% bar, engineer-pinned intests/TESTING.md, re-pinned manifest. Documents why FP-rate (not FN-rate) is the gate and how demotion/kill-switch works.docs/now ships in the npm package (files).
The second piece of the read-only brain: deterministic conformance, emitting Evidence Bundle rows.
audit-harness conform [repo](scripts/conform.py, stdlib + PyYAML): read-only conformance gate-runner. For everydimension: conformancegate in the repo'saudit-profile/v1, it locates the artifact(s) and emits agate-result/v1row (JSON array, stdout). Never writes, never live-fetches.- Bundled content-addressed schemas (
schemas/conform/v1/):skillmd-frontmatter,mcp-config,plugin-manifest,agent-frontmatter— the deterministic structural floor (parses + required keys + types), distinct from the IS 100-point rubric / SAK authoring kernel (judgment, stays in/validate-*). conform records each schema's sha256 in the row'spolicy_hash, so a row re-verifies against the exact schema version that produced it. - Reproducible-by-design engine. Bundled JSON-Schemas are checked by an embedded subset validator (complete for the closed bundled schemas) rather than ajv — deliberately, because ajv's availability/version varies per machine and would make signed evidence non-reproducible. Same commit + same harness version produce an identical verdict.
- Genuinely-external formats shell out: OpenAPI to
spectral, GitHub Action toyamllint. Missing tool produces anADVISORYindeterminate (never a falseFAIL). - Advisory-first. A conformance violation on an
enforcement: advisorygate isADVISORY(severityerror), exit 0 — logged, not blocking.--strict(or an engineer-promotedenforcement: blockinggate) turns a violation intoFAIL(exit 1). Missing artifact producesNOT_APPLICABLE. Kill-switch (AUDIT_HARNESS_DISABLE=1/.audit-harness.yml) produces an empty[], exit 0. tests/conform/: golden suite (31 checks) + pass/fail fixtures (valid + malformed SKILL.md, .mcp.json, plugin manifest, agent) — asserts valid to PASS, malformed to ADVISORY (default) / FAIL (--strict), every row validates againstgate-result/v1, the NOT_APPLICABLE + indeterminate paths, andpolicy_hash== bundled-schema sha256 + reproducible. Wired into CI (conformjob).
Scope boundary: conformance kinds without a bundled schema (marketplace, hook) resolve to ADVISORY indeterminate — drop a schema into schemas/conform/v1/ to light them up, no code change. No gate execution for testing-depth/security yet (Phase 3+).
The first piece of the "comprehensive audit, on any repo" build: the read-only brain.
audit-profile/v1schema (schemas/audit-profile/v1.schema.json): closed, versioned, hash-bearing value mirroringgate-result/v1. Four invariants: classifications are a UNION (not a winner),unresolved[]is the only Claude-refinable surface,waived ⇒ disabled(allOf-enforced),registry_hashmakes a profile reproducible.- Canonical dimension→gate registry (
schemas/audit-profile/registry.v1.json): the single datum that answers "which gates apply to repo-type X, in which dimension, at what applicability" —layer-applicability.mdandTESTING.mdbecome projections of it. audit-harness classify [repo](scripts/classify.py, stdlib-only): read-only repository classifier. Detects the UNION of repo-type + Claude-artifact classifications, resolves the gate set against the registry, recordsregistry_hash, and emits anaudit-profile/v1value to stdout. Never writes to the repo.- Safety levers:
INDETERMINATEresult class (infra failure ≠ policy failure); dispatcher per-command supervision viaAUDIT_HARNESS_TIMEOUT(kill a hung gate, exit 124);AUDIT_HARNESS_DISABLE=1kill-switch (gate commands no-op; classify emits an all-disabled profile); engineer-owned.audit-harness.ymloverride (classify_pins,advisory,disable_gates,disable) — see.audit-harness.example.yml. tests/classify/: golden fixture corpus (6 fixtures, authored before the classifier) + suite — golden-matches classifications, schema-validates every profile, exercises the kill-switch, the unknown/unresolved path, and override honoring. Wired into CI (classifyjob).schemas/now ships in the npm package (files) so the registry + schema are available to consumers on any repo.
Scope boundary: no conform verb, no gate execution yet (Phase 2+). classify is read-only and emits a profile only.
This is the first release published to npm via CI with Sigstore provenance. Until now the repo had no release workflow — npm was stuck at 0.1.0 while the code (and every other manifest) had advanced through 1.0.0 → 1.1.4, four minors of CHANGELOG-documented work that never reached consumers. npm install @intentsolutions/audit-harness resolved to the stale 0.1.0 tarball.
.github/workflows/release.yml(NEW): mirrors the provenance approach ofintent-eval-core's release workflow, adapted for this zero-dependency polyglot CLI (no pnpm, no lockfile, no TS build, no coverage). Triggers onpushof av*.*.*tag and onworkflow_dispatch. Setsid-token: writefor npm/Sigstore OIDC. Verifies the pushed tag matchespackage.json#version(skipped on manual dispatch since there's no tag), runs thenode bin/audit-harness.js --versionself-check + the repo'sescape-scan.sh --stagedtest script (non-blocking on no-staged-diff), thennpm publish --provenance --access public. TheNPM_TOKENrepo secret is already configured.
The GitHub repo was renamed audit-harness → intent-audit-harness, but the metadata still pointed at the old path.
package.json:homepage,repository.url, andbugs.urlrepointed fromjeremylongshore/audit-harness→jeremylongshore/intent-audit-harness(these render on npmjs.com).python/pyproject.toml+rust/Cargo.toml: project-URL fields (Homepage / Repository / Issues / Changelog / documentation) repointed to the renamed repo — these render on PyPI and crates.io.python/src/intent_audit_harness/__init__.py: docstring source-link repointed.README.md: thecurl … install.shline + the two "Related" skill links repointed to the renamed repo.install.sh: theREPO=variable, the usage-comment URLs at the top, and the re-run hint repointed; the defaultVERSIONbumped from the stalev0.1.0→v1.1.5.
The GitHub archive tarball unpacks as <repo>-<version>/, which became intent-audit-harness-1.1.5/ after the rename. The unpack-dir detection used find … -name 'audit-harness-*', and -name matches the basename with no implicit leading wildcard, so it matched nothing under the new prefix — every vendored install would have failed at "could not find unpacked dir". Changed the glob to -name '*audit-harness-*' (leading wildcard), which matches both the current intent-audit-harness-* name and legacy audit-harness-* tags. Verified against both directory names.
npm-version, License Apache-2.0, and Sigstore-provenance shields under the H1 (mirrors the intent-eval-core badge row). The "Part of the Intent Eval Platform" cross-link line is preserved.
Per the version-canonical-check CI gate (v1.0.2 PR #35). package.json (canonical), version.txt, python/pyproject.toml, python/src/intent_audit_harness/__init__.py, and rust/Cargo.toml all report 1.1.5. (rust/Cargo.lock is gitignored; its working-tree entry is aligned for local cargo builds.)
No new CLI commands, no new flags, no API change, no script behavior change. This is release-engineering + metadata: the publish pipeline that ships the existing 1.1.x code, plus URL corrections for the repo rename, plus the install.sh glob fix. The pinned policy scripts (.harness-hash) are untouched.
npm pack --dry-run→ tarball containsbin/,scripts/,README.md,LICENSE,NOTICE,CHANGELOG.mdperpackage.json#filesnode bin/audit-harness.js --version→1.1.5bash -n install.sh→ exit 0; unpack-dir glob matchesintent-audit-harness-1.1.5(and legacyaudit-harness-*)bash scripts/harness-hash.sh --verify→ OK (no pinned files changed)
Closes iah-gherkin-prev-blank-noise (bd_000-projects-o9q1, P2). The third awk block in scripts/gherkin-lint.sh (the And-at-scenario-start checker) opened with a bare prev_blank = 1 expression that awk interpreted as an always-true pattern with implicit { print } default action — flooding stdout with every line of every feature file alongside the intentional ERROR printf. prev_blank was never USED anywhere in the awk script (verified via grep). Removed both touches: the top-level expression AND the assignment in the blank-line pattern (which was also unreachable for anything that mattered, since no downstream pattern read prev_blank). The third awk block now produces ONLY the targeted ERROR line when triggered. Verified via the same deliberate-failure test from v1.1.2 AAR — output before: full feature file printed interleaved with ERROR. Output after: just the ERROR line.
Closes iah-gherkin-single-awk-opt (bd_000-projects-vawm, P3). v1.1.2 introduced process_awk_output() with two awk subprocesses per call (one counting WARN, one counting ERROR). v1.1.4 collapses to a single awk pass via read -r w e < <(awk '/^WARN /{w++} /^ERROR /{e++} END {print w+0, e+0}' <<< "$out") per Gemini PR #39 verbatim suggestion. Halves the awk fork count (4 callsites × 2 subprocesses = 8 awk processes/feature → 4). Verified with mixed WARN+ERROR test: 2 WARNs + 1 ERROR in one feature file produces summary 2 warning(s), 1 error(s) and exit 1.
Closes iah-crap-score-exclusion-dedup (bd_000-projects-niv8, P2). Pre-v1.1.4, scripts/crap-score.py had TWO separate sets with overlapping intent but divergent contents:
ignoreset inscore_python()(line 85): had"reports"but lacked.next,.nuxt,.cachepruneset inmain()(line 394, added v1.1.1 for--jsoninput-hash walk): had.next,.nuxt,.cachebut lacked"reports"
Asymmetry was a real bug: a repo with reports/ would skip score_python's candidate scan but its .py files DID get hashed by the input-hash walk; opposite for .next/.nuxt/.cache. Fixed by extracting a single module-level constant EXCLUDED_DIRS (union of both prior sets) referenced by both call sites. Set contents: .git, .venv, venv, node_modules, __pycache__, dist, build, target, .tox, .mypy_cache, .pytest_cache, .next, .nuxt, .cache, reports.
Closes iah-shellcheck-version-pin (bd_000-projects-v1ds, P3). v1.1.2 (Phase A1) installed shellcheck via apt-get install -y shellcheck which pulls whatever Ubuntu's runner-image version happens to ship (currently 0.9.0). When the runner image upgrades shellcheck to 0.10.x or later, new rules activate silently and could surface findings in already-merged code. v1.1.4 pins to v0.10.0 via download from the koalaman/shellcheck GitHub releases. CI step prints shellcheck --version for audit trail. To bump: edit SHELLCHECK_VERSION env in the workflow + run shellcheck scripts/*.sh locally + commit as explicit PR. Matches the ruff version-pin pattern from v1.1.3.
Per the version-canonical-check CI gate (v1.0.2 PR #35). All 5 manifest locations now report 1.1.4.
scripts/gherkin-lint.sh + scripts/crap-score.py modified; both are pinned. 2 of 9 pinned-file hashes change.
Pure cleanup release: dead-code removal, perf microoptimization, bug fixes for cross-call inconsistencies, CI version pin. No new CLI commands, no new flags, no API change. Consumers re-vendor / pnpm up and get the cleaner scripts + tighter CI transparently.
shellcheck scripts/*.sh→ exit 0 (local 0.9.0; CI will run pinned 0.10.0)ruff check→All checks passed!bash -n scripts/*.sh→ all passpython3 -m py_compile scripts/crap-score.py + cli.py→ exit 0bash scripts/harness-hash.sh --verify→ OK after--init- gherkin-lint deliberate-failure test (And-at-start): exit 1, summary correct
- gherkin-lint mixed test (2 WARN + 1 ERROR): summary
2 warning(s), 1 error(s), exit 1 - Output noise gone: feature-file lines no longer printed alongside ERRORs
AAR: 000-docs/009-AA-AACR-v1.1.4-cleanup-bundle-2026-05-25.md.
iah-python-wrapper-scripts-sync (bd_000-projects-65k4) remains open. The Python wrapper's python/src/intent_audit_harness/scripts/crap-score.py (and the Rust wrapper's mirror) are stale by design — install.sh sources from canonical scripts/ but wrapper packaging hasn't grown a build-time sync mechanism. Implementation requires choosing between hatch build-hook, Cargo build.rs, symlinks, or CI-enforced manual sync. Deferred to its own focused PR.
Closes iah-ruff (bd_000-projects-x9bs, P1). New .github/workflows/ci.yml job ruff (Python lint) runs ruff check (version-pinned to 0.15.4 per the iah-shellcheck-version-pin lesson) against the own-code Python surface. Ruleset select = ["B", "E", "F"] — pyflakes (F) for dead imports + unused variables; pycodestyle errors (E) for syntax-level issues; flake8-bugbear (B) for Python-specific bugs (mutable default args, unreliable exception handling — added per Gemini PR #39 review after empirical confirmation that zero new findings fire on our codebase). Line length set to 120 (modern Python convention). Further ratchet (I import-order, UP pyupgrade, etc.) deferred to a future ratchet bead.
- New
ruff.tomlat repo root: lint scope =scripts/*.py+python/src/intent_audit_harness/{__init__,__main__,cli}.py; excludespython/.venv/+python/src/intent_audit_harness/scripts/+rust/scripts/(the last two are bundled-content mirrors ofscripts/*— stale-sync tracked separately, see below). - Version pinned via
pip install 'ruff==0.15.4'; CI printsruff --versionfor audit trail.
scripts/crap-score.py: redundant localimport hashlib, osinside theif args.json:block was shadowing the module-levelimport os, causing ruff F401 against the top-level (which IS used by the same block). Per Gemini PR #39 review (PEP 8 alignment), movedhashlibto module-level imports alongside the other stdlib imports; removed the local re-import entirely. The bandaid-comment explaining the local import is also gone.scripts/crap-score.py: dead local variablemetrics = rec.get("metrics", {}).get("cyclomatic", {})inscore_rust()(line 266; F841). Assigned but never read. The actual cyclomatic value is fetched freshly inside the loop on line 268.python/src/intent_audit_harness/cli.py: deadimport osat line 12 (F401). Zeroos.*usages in the file.
- Line 84
ignoreset literal (155 chars) reformatted into a multi-line set literal that fits 120-char limit. Cosmetic; no behavior change.
Per the version-canonical-check CI gate (v1.0.2 PR #35). All 5 manifest locations now report 1.1.3.
scripts/crap-score.py is pinned by .harness-hash-extra-patterns; the dead-code removal + long-line reformat changes its hash. 1 of 9 pinned-file hashes change.
Pure lint-gate addition + dead-code removal. No new CLI commands, no new flags, no API change. Consumers re-vendor / pnpm up and get the cleaner scripts + the (new for them) ruff config transparently.
ruff check→All checks passed!on clean checkoutpython3 -m py_compile scripts/crap-score.py→ exit 0python3 -m py_compile python/src/intent_audit_harness/cli.py→ exit 0shellcheck scripts/*.sh→ exit 0 (no regression on Phase A1)bash scripts/harness-hash.sh --verify→ OK after--init- CI ruff job will block any future PR that introduces a Python lint finding (F401, F841, E*, etc.)
iah-python-wrapper-scripts-sync (new) — python/src/intent_audit_harness/scripts/crap-score.py is a stale mirror of scripts/crap-score.py, ~1 month behind canonical source. Missing the v1.1.1 --json envelope emission, the which_or_none("go") PATH guard, and the rglob-walk pruning. Same pattern likely in rust/scripts/. Either (a) build-time copy in the Python/Rust wrapper packaging, (b) symlink, or (c) hand-sync discipline with CI check. Currently excluded from ruff scope; exclusion drops once the sync mechanism ships.
AAR: 000-docs/008-AA-AACR-ruff-iep-P6-2026-05-24.md.
P6 Phase A2 complete. Next-ready P6 work:
- A3:
iah-eslint-dispatcher(bd_000-projects-rnpy) — eslint coverage forbin/audit-harness.js - B1:
iep-shared-lint-configs—.audit-harness-configs/for vendoring lint configs to consumer repos - Plus 2 bundleable Gemini-found fixes from v1.1.2 review:
iah-gherkin-prev-blank-noise+iah-gherkin-single-awk-opt
Changed — Shellcheck CI gate flipped from tolerant to hard-fail (IEP Convergence Debt Plan Priority 6 Phase A1)
Closes iah-shellcheck-hard-fail (bd_000-projects-4asc, P1). The shellcheck job in .github/workflows/ci.yml previously ran shellcheck scripts/*.sh || true — warnings and errors were logged but never blocked the PR. As of this release the || true suffix is removed: any shellcheck finding (warning or error) blocks the build. The locked precondition was v1.1.1 (PR #37) which addressed the 6 Gemini-flagged robustness findings — the surface was already clean enough that flipping the gate exposed exactly 3 residual dead-code findings, all fixed below.
scripts/bias-count.sh:declare -A PATTERN_COUNTSplus the per-callPATTERN_COUNTS["$label"]=$countassignment incount_pattern(). SC2034: the associative array was populated but never read. Per-pattern counts are still printed inline (line 61) and are aggregated intoTOTAL_BIASfor the JSON outputbias_totalmetadata field; the per-pattern breakdown was apparently intended for a richer JSON shape that was never wired. Restoring it would be a feature, not a fix; filed as deferred scope if a consumer asks.scripts/emit-evidence.sh:INPUT_HASH_HEX="$(echo "$STATEMENT" | python3 -c ...)"(formerly line 238). SC2034: computed but never read. Vestige from an earlier cosign integration; the surroundingBLOB_FILEconstruction relies onARTIFACT_NAMEonly.scripts/gherkin-lint.sh:err()helper function. SC2317: zero call sites in the file (verified viagrep -n "\berr\b"— only the definition matches). The helper was defined symmetrically withwarn()but never wired up to the awk rubric or the subprocess-fallback path. Replaced withprocess_awk_output()helper (see Fixed section below).
While processing the SC2317 cleanup above, Gemini's PR #38 review surfaced a deeper bug: the gherkin-lint.sh awk-fallback path printed WARN/ERROR lines via awk printf but those subprocesses never incremented the parent shell's WARN_COUNT/ERROR_COUNT counters. The summary line said "0 warnings, 0 errors" while errors were actively being printed; the exit code stayed 0 regardless. Exactly the silent-failure class the linter exists to surface in OTHER projects.
- New
process_awk_output()helper: wraps each awk subprocess, captures its output, countsWARN/ERRORlines via inline awk ('/^WARN /{c++} END{print c+0}'— set-euo-pipefail safe, no|| trueneeded), increments the bash counters, then re-prints. 4 awk blocks now feed through it. - Verification: deliberate-failure test against a feature with
Scenario: ... \n And ...produces exit code 1 + summary0 warning(s), 1 error(s)(was: exit 0 +0 warning(s), 0 error(s)while still printing the ERROR line). Clean feature still exits 0. - Separate-scope finding: the third awk script contains a stray top-level
prev_blank = 1that awk treats as an always-true pattern, triggering its default print-every-line action. That's a pre-existing cosmetic issue (extra noise in script output) but not a counter bug — filed as deferred scope.
Per the version-canonical-check CI gate (v1.0.2 PR #35). All 5 committed manifest locations now report 1.1.2:
package.jsonversion.txtpython/pyproject.tomlpython/src/intent_audit_harness/__init__.pyrust/Cargo.toml
The self-pinning manifest is regenerated to capture the new script hashes (per iep-P3 iah-self-pin v1.1.0 mechanism). 3 of 9 pinned-file hashes change (the 3 modified scripts); 6 unchanged.
Pure dead-code removal + a CI policy tightening. No new CLI commands, no new flags, no API change, no behavioral change for any consumer. Downstream consumers re-vendor (or pnpm up) and get the cleaner scripts transparently.
shellcheck scripts/*.sh→ exit 0 on a clean checkout (verified locally before push)bash -n scripts/*.sh→ all passpython3 -m py_compile scripts/crap-score.py→ exit 0bash scripts/harness-hash.sh --verify→ harness-hash: OK after--init- CI shellcheck job will now block on any future warning — try staging
cmd $var(unquoted expansion) to verify the gate fires
AAR: 000-docs/007-AA-AACR-shellcheck-hard-fail-iep-P6-2026-05-24.md.
P6 Phase A1 closed. Next-ready P6 work:
- A2:
iah-ruff— add Python ruff CI gate - A3:
iah-eslint-dispatcher— add eslint coverage forbin/audit-harness.js - A4:
iah-script-robustness-upstream(already shipped in v1.1.1; nothing more to do)
Closes iah-script-robustness-upstream (bd_000-projects-qqkq, P2). Addresses the 6 medium-severity Gemini findings surfaced when audit-harness scripts were vendored into intent-eval-lab via iep-harness-hash-platform-rollout (PR #67). All fixes are upstream-only: zero CLI surface change, zero runtime-dep change, zero policy change.
scripts/escape-scan.sh(mktemp leak):--stagedand--rangemodes allocate a temp file viamktempto capture the diff but never clean it up. Addstrap 'rm -f "$DIFF_SRC"' EXITimmediately after eachmktempso the temp file is removed on every exit path (clean exit, REFUSE, CHALLENGE, signal). Matters most when escape-scan runs as a local git hook where temp accumulation is silent.scripts/crap-score.py(missinggoPATH guard):score_go()calledrun(["go", "test", "-coverprofile=...", ...])without first checking thatgois on PATH, so on systems without Go installed the subprocess raisedFileNotFoundErrorand aborted the whole CRAP pass. Wraps the call in the existingwhich_or_none("go")pattern already used forradon,gocyclo, and the downstreamgo tool coverinvocation.scripts/crap-score.py(rglob walk pruning): the--jsoninput-hash computation walked every file underrootviarglob("*"), only filteringnode_modules/.venvafter the directory had been traversed. Replaces withos.walk+dirs[:] = [...]in-place pruning, skipping.git,node_modules,.venv/venv,__pycache__,dist,build,target,.tox,.mypy_cache,.pytest_cache,.next,.nuxt,.cache. Major perf win on large repos; no behavioral change to the resulting hash for repos without pruned-extension files under those directories.scripts/emit-evidence.sh(shell→Python path injection):python3 -c "import json, sys; print(json.load(open('$PKG_JSON'))['version'])"interpolated the shell variable directly into the Python source. Paths containing single quotes (or arbitrary characters in adversarial cases) broke the parse. Now passes$PKG_JSONviasys.argv[1]—python3 -c "import json, sys; print(json.load(open(sys.argv[1]))['version'])" "$PKG_JSON"— moving the path through the safe argv channel.scripts/bias-count.sh(per-file sha256sum fork):find ... -exec sha256sum {} \;spawned onesha256sumprocess per matched file. Changes the terminator to+sofindbatches arguments into one (or few) sha256sum invocations. Perf win on test suites with many files; output identical because the downstreamsort | sha256sumstep normalizes.scripts/harness-hash.sh(cross-platform sha256sum): GNU coreutils shipssha256sum, macOS shipsshasum -a 256. Adds detection at script top selecting whichever is available into aSHA256_CMDbash array, falling back with a clear error if neither is on PATH. Both produce identical<hash> <file>output, so the manifest format and downstreamawkparsing are byte-equivalent. Enables engineer-local runs on macOS without forcing every contributor to install coreutils.
Per the version-canonical-check CI gate (added in v1.0.2 PR #35). All 5 committed manifest locations now report 1.1.1:
package.jsonversion.txtpython/pyproject.tomlpython/src/intent_audit_harness/__init__.pyrust/Cargo.toml
The self-pinning manifest is regenerated to capture the new script hashes (per iep-P3 iah-self-pin v1.1.0 mechanism). The 6 script edits change 4 of the 9 pinned-file hashes; --init rewrites the manifest.
Pure bug + portability fixes. No new flags, no new commands, no policy change, no breaking change to the manifest format. Downstream consumers re-vendor (or re-install via the polyglot installers) and get the improvements transparently.
The scripts in this release are now vendored into intent-eval-lab (per iep-harness-hash-platform-rollout rollout 1, lab PR #67) and will land in j-rig-binary-eval next. Bug-fix patches travel via re-vendor — AUDIT_HARNESS_VERSION=v1.1.1 curl -sSL https://raw.githubusercontent.com/jeremylongshore/audit-harness/main/install.sh | bash for vendored consumers, pnpm up @intentsolutions/audit-harness for node consumers. Landing the fixes before the rollout reaches more repos avoids re-publishing buggy vendored copies that immediately need replacement.
AAR: 000-docs/006-AA-AACR-script-robustness-upstream-iep-P3-2026-05-23.md.
Priority 6 Phase A1 (iah-shellcheck-hard-fail) flips .github/workflows/ci.yml:89 from shellcheck scripts/*.sh || true to hard-fail. Per the IEP Convergence Debt Plan risk-mitigation table ("Flipping shellcheck to hard-fail breaks existing audit-harness CI — mitigation: land fixes for Gemini's 6 findings FIRST, THEN flip the gate"), this release is the explicit precondition for the shellcheck flip. Phase A1 PR opens after v1.1.1 lands on main.
Added — Per-repo .harness-hash-extra-patterns mechanism + audit-harness self-pin (IEP Convergence Debt Plan Priority 3)
Closes iah-self-pin (bd_000-projects-itpl, P1). The harness's own policy enforcement surface (scripts/.sh + scripts/.py + bin/audit-harness.js) is now hash-pinned at the audit-harness repo root. CI's audit-harness list + harness-hash --verify self-check steps are flipped from || true exit-3 tolerance to hard-fail: any byte change to a pinned policy file without a fresh --init + commit of the regenerated .harness-hash exits 2 (HARNESS_TAMPERED) and blocks the PR.
scripts/harness-hash.sh: NEW — reads an optional.harness-hash-extra-patternsfile at the repo root and appends its lines to the default PATTERNS array. Comments (#) + blank lines ignored. Backward-compatible: repos without the file get exactly the previous behavior — consumer repos are not affected..harness-hash-extra-patterns(NEW, audit-harness repo root): pinsscripts/*.sh,scripts/*.py,bin/audit-harness.js, and the extras file itself (preventing silent edits to the self-pinning scope)..harness-hash(NEW, audit-harness repo root): 9-file manifest produced bybash scripts/harness-hash.sh --init. Committed to main..github/workflows/ci.yml:audit-harness list+harness-hash --verifyself-check steps drop|| truesuffixes. Hard-fail in place. Comment block updated.
The .harness-hash-extra-patterns mechanism is a new authored feature surface — repos that opt in get a new capability. Per SemVer, minor bump. Existing repos (zero adopters today; this is the first one) are unaffected.
Before this release, the audit-harness CI workflow could not enforce its own policy. The "harness tests itself" design rule (CLAUDE.md rule 5) was aspirational — audit-harness list and harness-hash --verify both exited 0 when no manifest existed (intentional tolerance to avoid false-failing every PR). A silent edit to scripts/escape-scan.sh (the gate that REFUSES threshold-lowering changes) would pass CI. That's the failure mode this release closes.
iep-harness-hash-platform-rollout (bd_000-projects-g6zu) unblocks on this release. The remaining 4 IEP repos (intent-eval-lab, j-rig-binary-eval, intent-rollout-gate — kernel already pinned) can now copy this pattern using their own .harness-hash-extra-patterns to pin per-repo policy files (CI workflow definitions, governance docs, vendored harness wrappers).
Per the version-canonical-check CI gate landed in v1.0.2 (PR #35). All 5 committed manifest locations now report 1.1.0.
AAR: 000-docs/005-AA-AACR-iah-self-pin-iep-P3-2026-05-22.md.