Created
August 24, 2025 02:36
-
-
Save githubcustomerserviceistrash/c716e762e843e85c629befee877f10f0 to your computer and use it in GitHub Desktop.
ChatGPT5 Claude Workflow Prompt
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
**PROMPT FOR LLM (copy+paste below, then append your plan)** | |
You are converting a **generic plan of action** into a **production-ready research/engineering `todo.md`** that is directly executable by a capable coding agent. Your output must be a single Markdown document with a fenced **XML `<workflows>`** block. Do **not** include any extra commentary, explanations, or chat—**output only the document**. | |
### Transformation requirements | |
* Be **specific** and **operational**. Replace vague goals with concrete steps, commands, checklists, acceptance gates, and explicit assumptions. | |
* Prefer **compact, high-signal prose**. No filler. Use short paragraphs and terse bullets. | |
* If information is missing, make **minimal, clearly labeled assumptions** (e.g., “Assumption: …”). Do **not** ask questions. | |
* Compile the spec into **executable oracles**: pre/postconditions, invariants, consumer/provider API contracts, and **metamorphic properties**. Generate both tests and **runtime guards**. | |
* Enforce **hermetic spin-up**: pinned toolchains/lockfiles, reproducible container image, seed data, migrations, health/readiness probes, **golden smoke flows**, and a signed **boot transcript** artifact. | |
* Beyond unit/integration/e2e: include **property-based testing**, **metamorphic testing**, **mutation testing** (score target), **grammar/coverage-guided fuzzing**, **concolic/symbolic execution** for critical paths, **differential tests** vs last known-good, **contract tests** for external deps, and **runtime invariant checks** with shadow traffic or replay. | |
* Add **static/semantic gates**: strict typing/linters, SAST/taint, API surface diffs, complexity deltas, license/OSS policy. | |
* Implement a **risk score** and **Gatekeeper** to decide: **AGENT\_REFINE** (auto-iterate) vs **MANUAL\_QA** (human exploration) vs **PROMOTE** (stage/ship). | |
* Use **relative improvements** (percent) when comparing methods; keep budget parity rules explicit (e.g., “±5% params/FLOPs”). | |
* Include **reproducibility and guardrails** (seeds, SHAs, data/index hashes, environment pins). | |
* Treat statistics rigor as first-class (paired bootstrap CIs, multiple-comparison control) unless the domain makes this irrelevant. | |
* Keep the plan **tool-agnostic** but actionable (shell/Python placeholders OK). | |
### Required document structure (exact section order) | |
1. **Title** — `# {{PROJECT_NAME}} — \`todo.md\`\` | |
2. **TL;DR** — one line. | |
3. **Invariants (do not change)** — non-negotiable constraints. | |
4. **Assumptions & Scope** — what you’re assuming (label uncertain items). | |
5. **Objectives** — 3–5 measurable goals. | |
6. **Risks & Mitigations** — top risks with a single mitigation each. | |
7. **Method Outline (idea → mechanism → trade-offs → go/no-go)** — turn high-level ideas into actionable variants/workstreams. | |
8. **Run Matrix** — table of variants with budgets and promotion criteria. | |
9. **Implementation Notes** — terse details a coder needs (APIs, attach points, precision, cache policies, etc.). | |
10. **Acceptance Gates** — pass/fail thresholds tied to Objectives. | |
11. **“Make-sure-you” Checklist** — must-do guardrails for the agent. | |
12. **File/Layout Plan** — directories and key files to create. | |
13. **Fenced XML Workflows** — **mandatory**: `building`, `running`, `tracking`, `evaluating`, `refinement`. | |
* Each `<workflow>` contains ordered `<commands>` and a **`<make_sure>`** checklist. | |
* Use explicit IDs (e.g., `id="R1"`). | |
* Commands may be placeholders but must be realistic and sequenced. | |
14. **Minimal Pseudocode (optional)** — only if it clarifies tricky parts. | |
15. **Next Actions (strict order)** — 3–6 concrete steps the agent executes next. | |
### Statistical & evaluation defaults (apply unless the plan dictates otherwise) | |
* Report paired metrics with **10k bootstrap**, **BCa 95% CIs**; mark significance only if CI lower bound > 0. | |
* Control family-wise errors (e.g., **FDR** within metric families). | |
* Maintain **budget parity** (params & FLOPs within **±5%**) across variants unless a “decoding-only” or “systems” budget is declared separately. | |
* Always show **two slices** if applicable (e.g., “Focused” vs “Full”); never hide slices. | |
* Include **latency p50/p95**, throughput, and memory/VRAM when performance matters. | |
* **Verification defaults:** Hermetic spin-up must pass. **Mutation score ≥ 0.80**; **property/metamorphic coverage ≥ 0.70**; **0 high/critical SAST**; **flakiness < 1%** over 100 reruns; **runtime invariants** hold over N=10k shadow requests; **API contracts** green. | |
### Language & style constraints | |
* Crisp, technical, neutral tone. | |
* No self-references (“As an AI…”), no questions. | |
* Use code fences for XML and pseudocode. | |
* Use placeholders like `{{MODEL_NAME}}`, `{{DATASET}}`, `{{RANK_SCHEDULE}}` when the plan lacks specifics; **label them** in Assumptions. | |
--- | |
### OUTPUT TEMPLATE (fill every section) | |
# {{PROJECT\_NAME}} — `todo.md` | |
**TL;DR:** {{one-sentence summary of the execution plan}} | |
## Invariants (do not change) | |
* {{Constraint 1}} | |
* {{Constraint 2}} | |
* {{Oracles are source-of-truth; contracts & properties enforced at runtime}} | |
* {{Hermetic spin-up required; boot transcript must verify}} | |
* {{…}} | |
## Assumptions & Scope | |
* **Assumption:** {{ explicit assumption }} | |
* **Assumption:** {{ thresholds if unspecified: T\_mut=0.80, T\_prop=0.70, SAST\_high=0 }} | |
* **Scope:** {{ what’s in/out }} | |
## Objectives | |
1. {{Objective with measurable target}} | |
2. **Verification:** Achieve mutation ≥ {{T\_mut}} and property/metamorphic coverage ≥ {{T\_prop}}. | |
3. **Reliability:** Zero high/critical SAST; flakiness < 1% on reruns; all API contracts pass. | |
4. **Spin-up:** One-shot hermetic boot from clean checkout with golden smoke flows; produce signed boot transcript. | |
5. {{Domain KPI improvement with CI-backed threshold}} | |
## Risks & Mitigations | |
* {{Risk}} → **Mitigation:** {{one-line fix}} | |
* External API drift → **Mitigation:** Consumer/provider contracts + service virtualization. | |
* Env non-determinism → **Mitigation:** Containerized toolchain, pinned lockfiles, deterministic seeds. | |
* Unknown-unknown logic gaps → **Mitigation:** Metamorphic + fuzzing + runtime invariants with shadow traffic. | |
## Method Outline (idea → mechanism → trade-offs → go/no-go) | |
### Workstream/Variant A — Spec→Oracles Pipeline | |
* **Idea:** Compile spec to contracts/properties/metamorphic tests + runtime guards. | |
* **Mechanism:** DSL → codegen (pre/post/invariants), property-based suites, metamorphic relations; inject assert/monitor hooks. | |
* **Trade-offs:** Upfront authoring cost; stricter gates surface more refactors. | |
* **Go/No-Go Gate:** All generated oracles pass; property coverage ≥ {{T\_prop}}. | |
### Workstream/Variant B — Adversarial Verification | |
* **Idea:** Kill mutants, explore paths, and stress inputs. | |
* **Mechanism:** Mutation testing; grammar/coverage-guided fuzzing; concolic on parsers/auth/finance. | |
* **Trade-offs:** Longer CI wall time; infra complexity. | |
* **Go/No-Go Gate:** Mutation ≥ {{T\_mut}}; no exploitable paths found at severity ≥ medium. | |
### Workstream/Variant C — Differential & Contract Testing | |
* **Idea:** Compare against last known-good/reference; lock external behavior. | |
* **Mechanism:** Golden snapshots, differential tests, Pact-like contracts validated in CI; API surface diffs. | |
* **Trade-offs:** Snapshot churn; contract maintenance. | |
* **Go/No-Go Gate:** No incompatible diffs; contracts green. | |
## Run Matrix | |
| ID | Method/Variant | Budget | Inputs | Expected Gain | Promote if… | | |
| -- | ---------------------------- | ----------------------------- | ------------------------ | ------------------- | -------------------------------------------- | | |
| V1 | Spec→Oracles | ±5% parity / baseline compute | Spec DSL, current code | Fewer escapes | Property ≥ {{T\_prop}} & all invariants hold | | |
| V2 | Mutation+Fuzz+Concolic | Separate verification budget | Corpus/grammars | Kill weak tests | Mutation ≥ {{T\_mut}}; 0 high/crit SAST | | |
| V3 | Differential+Contracts | ±5% parity | Golden outputs, provider | Regression defense | Zero incompatible diffs; contracts green | | |
| V4 | Runtime Invariant Monitoring | Staging/shadow only | Shadow traffic/replays | Prod-parity signals | 0 invariant breaks over N=10k requests | | |
## Implementation Notes | |
* **APIs/Attach points:** {{paths/interfaces for injecting contracts & monitors}} | |
* **Precision/Quantization:** {{fp16/fp8/int8 policy}} | |
* **Caching/State:** {{cache windows, stickiness, invalidation}} | |
* **Telemetry:** Log mutation score, property coverage, fuzz/crash repros, SAST severity counts, API diffs, visual diffs, flakiness, boot transcript hash. | |
* **Repro:** Seeds, SHAs, container digest, dataset/index hashes, contract versions. | |
## Acceptance Gates | |
* **Spin-up:** Clean checkout → container build → migrate/seed → readiness OK → golden smokes pass → **boot transcript signed**. | |
* **Static:** 0 high/critical SAST; typecheck clean; license policy OK; API surface diffs acknowledged. | |
* **Dynamic:** Mutation ≥ {{T\_mut}}; property/metamorphic coverage ≥ {{T\_prop}}; fuzzing runtime ≥ {{X}} mins with 0 new medium+ crashes. | |
* **Differential/Contracts:** No incompatible diffs; contracts green. | |
* **Runtime (staging/shadow):** 0 invariant breaks over N=10k requests, error budget respected. | |
* **Domain KPI:** CI lower bound > 0 on {{primary\_metric}} within budget parity. | |
## “Make-sure-you” Checklist | |
* Pin toolchain & deps; record env manifest and container digest. | |
* Generate contracts/properties from spec; commit artifacts. | |
* Save **boot transcript** and artifact hashes. | |
* Record seeds; rerun flaky tests 100×; fail on flakiness. | |
* Quarantine network; stub external deps unless in contract tests. | |
* Export metrics JSONL; persist logs/artifacts under `artifacts/`. | |
## File/Layout Plan | |
``` | |
{{repo_root}}/ | |
spec/ # DSL + compiled contracts/properties | |
contracts/ # consumer/provider specs | |
src/ | |
tests/ | |
properties/ | |
metamorphic/ | |
mutation/ | |
differential/ | |
e2e/ | |
scripts/ | |
gatekeeper.py | |
spinup_smoke.sh | |
compute_risk.py | |
artifacts/ | |
boot_transcript.json | |
metrics/ | |
infra/ | |
analysis/ | |
logs/ | |
``` | |
## Workflows (required) | |
```xml | |
<workflows project="{{PROJECT_SLUG}}" version="1.0"> | |
<!-- =============================== --> | |
<!-- BUILDING: env, assets, guards --> | |
<!-- =============================== --> | |
<workflow name="building"> | |
<env id="B0"> | |
<desc>Set up environment and pin versions</desc> | |
<commands> | |
<cmd>{{create_venv}}</cmd> | |
<cmd>{{install_packages}}</cmd> | |
<cmd>{{container_build_with_lockfiles}}</cmd> | |
<cmd>{{record_env_manifest}}</cmd> | |
</commands> | |
<make_sure> | |
<item>{{GPU/CPU visibility test}}</item> | |
<item>{{lockfile / hashes saved}}</item> | |
</make_sure> | |
</env> | |
<assets id="B1"> | |
<desc>Fetch models/data/indexes or domain assets</desc> | |
<commands> | |
<cmd>{{download_or_prepare_assets}}</cmd> | |
<cmd>{{verify_licenses_and_hashes}}</cmd> | |
</commands> | |
<make_sure> | |
<item>{{asset SHAs recorded}}</item> | |
</make_sure> | |
</assets> | |
<contracts id="B2"> | |
<desc>Compile spec to contracts/properties/metamorphic tests</desc> | |
<commands> | |
<cmd>{{compile_spec_to_contracts}}</cmd> | |
<cmd>{{generate_property_tests}}</cmd> | |
<cmd>{{inject_runtime_guards}}</cmd> | |
</commands> | |
<make_sure> | |
<item>{{oracles generated and versioned}}</item> | |
</make_sure> | |
</contracts> | |
<static id="B3"> | |
<desc>Enable static/semantic guardrails</desc> | |
<commands> | |
<cmd>{{run_typecheck_linters}}</cmd> | |
<cmd>{{run_SAST_taint}}</cmd> | |
<cmd>{{api_surface_diff}}</cmd> | |
<cmd>{{complexity_delta_check}}</cmd> | |
</commands> | |
<make_sure> | |
<item>{{abort_on_high/critical findings}}</item> | |
</make_sure> | |
</static> | |
<spinup id="B4"> | |
<desc>Hermetic boot; produce boot transcript</desc> | |
<commands> | |
<cmd>{{container_run_clean_checkout}}</cmd> | |
<cmd>{{apply_migrations_and_seed}}</cmd> | |
<cmd>{{readiness_and_health_checks}}</cmd> | |
<cmd>{{run_golden_smokes}}</cmd> | |
<cmd>{{write_boot_transcript_json}}</cmd> | |
</commands> | |
<make_sure> | |
<item>{{transcript signed with env digest}}</item> | |
</make_sure> | |
</spinup> | |
</workflow> | |
<!-- =============================== --> | |
<!-- RUNNING: verification battery --> | |
<!-- =============================== --> | |
<workflow name="running"> | |
<baseline id="R0"> | |
<desc>Run baseline under parity</desc> | |
<commands> | |
<cmd>{{train_or_build_baseline}}</cmd> | |
<cmd>{{evaluate_baseline}}</cmd> | |
</commands> | |
<make_sure> | |
<item>{{same attach points / budgets}}</item> | |
</make_sure> | |
</baseline> | |
<contracts id="R1"> | |
<desc>API consumer/provider contracts</desc> | |
<commands> | |
<cmd>{{start_service_virtualization}}</cmd> | |
<cmd>{{run_contract_tests}}</cmd> | |
</commands> | |
<make_sure> | |
<item>{{no contract breaks}}</item> | |
</make_sure> | |
</contracts> | |
<properties id="R2"> | |
<desc>Property & metamorphic tests</desc> | |
<commands> | |
<cmd>{{run_property_tests_with_seeds}}</cmd> | |
<cmd>{{run_metamorphic_suites}}</cmd> | |
</commands> | |
<make_sure> | |
<item>{{report property coverage}}</item> | |
</make_sure> | |
</properties> | |
<fuzz_symbolic id="R3"> | |
<desc>Grammar-guided fuzzing + concolic on critical paths</desc> | |
<commands> | |
<cmd>{{fuzz_parsers_and_endpoints}}</cmd> | |
<cmd>{{concolic_on_auth_and_money}}</cmd> | |
</commands> | |
<make_sure> | |
<item>{{crashes minimized; repros archived}}</item> | |
</make_sure> | |
</fuzz_symbolic> | |
<mutation id="R4"> | |
<desc>Mutation testing for adequacy</desc> | |
<commands> | |
<cmd>{{generate_mutants}}</cmd> | |
<cmd>{{run_mutation_suite}}</cmd> | |
<cmd>{{compute_mutation_score}}</cmd> | |
</commands> | |
<make_sure> | |
<item>{{mutation score ≥ {{T_mut}}}}</item> | |
</make_sure> | |
</mutation> | |
<differential id="R5"> | |
<desc>Golden snapshots & diffs vs last known-good</desc> | |
<commands> | |
<cmd>{{capture_golden_outputs}}</cmd> | |
<cmd>{{run_differential_tests}}</cmd> | |
<cmd>{{visual_diff_if_UI}}</cmd> | |
</commands> | |
<make_sure> | |
<item>{{no incompatible diffs}}</item> | |
</make_sure> | |
</differential> | |
<runtime id="R6"> | |
<desc>Staging with shadow traffic; runtime invariants</desc> | |
<commands> | |
<cmd>{{deploy_to_staging}}</cmd> | |
<cmd>{{mirror_or_replay_traffic}}</cmd> | |
<cmd>{{monitor_invariant_breaks}}</cmd> | |
</commands> | |
<make_sure> | |
<item>{{0 invariant breaks over N requests}}</item> | |
</make_sure> | |
</runtime> | |
</workflow> | |
<!-- =============================== --> | |
<!-- TRACKING: collect & compute --> | |
<!-- =============================== --> | |
<workflow name="tracking"> | |
<harvest id="T1"> | |
<desc>Consolidate metrics/artifacts; compute statistics</desc> | |
<commands> | |
<cmd>{{collect_logs_to_jsonl}}</cmd> | |
<cmd>{{score_outputs}}</cmd> | |
<cmd>{{paired_bootstrap_and_FDR}}</cmd> | |
<cmd>{{summarize_SAST_and_static}}</cmd> | |
<cmd>{{summarize_mutation_property_coverage}}</cmd> | |
<cmd>{{summarize_flakiness_visual_diffs}}</cmd> | |
<cmd>{{hash_and_store_boot_transcript}}</cmd> | |
</commands> | |
<make_sure> | |
<item>{{CI policy applied; stars only when CI>0}}</item> | |
</make_sure> | |
</harvest> | |
<risk id="T2"> | |
<desc>Compute risk score R and decision features</desc> | |
<commands> | |
<cmd>{{compute_risk.py --delta_loc --novelty --ext_dep_delta --one_minus_mutation --flakiness --static_severity}}</cmd> | |
<cmd>{{emit_decision_features_json}}</cmd> | |
</commands> | |
<make_sure> | |
<item>{{features normalized to [0,1]}}</item> | |
</make_sure> | |
</risk> | |
</workflow> | |
<!-- =============================== --> | |
<!-- EVALUATING: promotion rules --> | |
<!-- =============================== --> | |
<workflow name="evaluating"> | |
<gatekeeper id="E1"> | |
<desc>Apply gates and decide: AGENT_REFINE vs MANUAL_QA vs PROMOTE</desc> | |
<commands> | |
<cmd>{{apply_acceptance_gates}}</cmd> | |
<cmd>{{compute_R_and_compare_thresholds}}</cmd> | |
<cmd>{{route_decision}}</cmd> | |
<cmd>{{generate_tables_and_figures}}</cmd> | |
</commands> | |
<make_sure> | |
<item>{{no promotion without CI-backed wins and gates met}}</item> | |
</make_sure> | |
</gatekeeper> | |
</workflow> | |
<!-- =============================== --> | |
<!-- REFINEMENT: next iteration --> | |
<!-- =============================== --> | |
<workflow name="refinement"> | |
<agent_refine id="N1"> | |
<desc>Auto-iterate with obligation-driven prompts (if routed)</desc> | |
<commands> | |
<cmd>{{create_actionable_prompt_from_failures}}</cmd> | |
<cmd>{{schedule_agent_build_and_verify}}</cmd> | |
</commands> | |
<make_sure> | |
<item>{{prompts include concrete obligations & thresholds}}</item> | |
</make_sure> | |
</agent_refine> | |
<manual_qa id="N2"> | |
<desc>Human exploration handoff (if routed)</desc> | |
<commands> | |
<cmd>{{open_tracking_dashboard}}</cmd> | |
<cmd>{{attach_repros_boot_transcript_contracts}}</cmd> | |
</commands> | |
<make_sure> | |
<item>{{clear owner; rollback/kill-switch ready}}</item> | |
</make_sure> | |
</manual_qa> | |
</workflow> | |
</workflows> | |
``` | |
## Minimal Pseudocode (optional) | |
```python | |
# Gatekeeper decision (feature weights are configurable) | |
def decide(metrics, T_mut=0.80, T_prop=0.70, T_manual=0.50): | |
if not metrics["hermetic_spinup_pass"]: return "AGENT_REFINE: fix spin-up" | |
if metrics["sast_high_critical"] > 0: return "AGENT_REFINE: resolve SAST" | |
if metrics["mutation"] < T_mut: return "AGENT_REFINE: raise mutation" | |
if metrics["prop_cov"] < T_prop: return "AGENT_REFINE: add properties" | |
R = (w1*metrics["delta_loc"] + w2*metrics["novelty"] + | |
w3*metrics["ext_dep_delta"] + w4*(1-metrics["mutation"]) + | |
w5*metrics["flakiness"] + w6*metrics["static_severity"]) | |
if R >= T_manual: return "MANUAL_QA" | |
return "PROMOTE" | |
``` | |
## Next Actions (strict order) | |
1. Normalize plan into Objectives and Acceptance Gates; set T\_mut/T\_prop thresholds. | |
2. Define or import spec DSL; generate contracts/properties/metamorphic tests. | |
3. Implement spin-up script and golden smokes; emit boot transcript. | |
4. Wire mutation/fuzz/concolic and differential test harnesses; add metrics logging. | |
5. Add Gatekeeper and risk computation; connect to CI promotion step. | |
--- | |
### HOW TO USE | |
* Paste your rough plan after this line: `=== PLAN START ===` … `=== PLAN END ===`. | |
* The model must map plan elements into the template above, filling placeholders, and inventing **only** minimal, labeled assumptions where needed. | |
* The model must output **only** the final Markdown document (no extra commentary). | |
**END OF PROMPT** |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment