Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save githubcustomerserviceistrash/c716e762e843e85c629befee877f10f0 to your computer and use it in GitHub Desktop.
Save githubcustomerserviceistrash/c716e762e843e85c629befee877f10f0 to your computer and use it in GitHub Desktop.
ChatGPT5 Claude Workflow Prompt
**PROMPT FOR LLM (copy+paste below, then append your plan)**
You are converting a **generic plan of action** into a **production-ready research/engineering `todo.md`** that is directly executable by a capable coding agent. Your output must be a single Markdown document with a fenced **XML `<workflows>`** block. Do **not** include any extra commentary, explanations, or chat—**output only the document**.
### Transformation requirements
* Be **specific** and **operational**. Replace vague goals with concrete steps, commands, checklists, acceptance gates, and explicit assumptions.
* Prefer **compact, high-signal prose**. No filler. Use short paragraphs and terse bullets.
* If information is missing, make **minimal, clearly labeled assumptions** (e.g., “Assumption: …”). Do **not** ask questions.
* Compile the spec into **executable oracles**: pre/postconditions, invariants, consumer/provider API contracts, and **metamorphic properties**. Generate both tests and **runtime guards**.
* Enforce **hermetic spin-up**: pinned toolchains/lockfiles, reproducible container image, seed data, migrations, health/readiness probes, **golden smoke flows**, and a signed **boot transcript** artifact.
* Beyond unit/integration/e2e: include **property-based testing**, **metamorphic testing**, **mutation testing** (score target), **grammar/coverage-guided fuzzing**, **concolic/symbolic execution** for critical paths, **differential tests** vs last known-good, **contract tests** for external deps, and **runtime invariant checks** with shadow traffic or replay.
* Add **static/semantic gates**: strict typing/linters, SAST/taint, API surface diffs, complexity deltas, license/OSS policy.
* Implement a **risk score** and **Gatekeeper** to decide: **AGENT\_REFINE** (auto-iterate) vs **MANUAL\_QA** (human exploration) vs **PROMOTE** (stage/ship).
* Use **relative improvements** (percent) when comparing methods; keep budget parity rules explicit (e.g., “±5% params/FLOPs”).
* Include **reproducibility and guardrails** (seeds, SHAs, data/index hashes, environment pins).
* Treat statistics rigor as first-class (paired bootstrap CIs, multiple-comparison control) unless the domain makes this irrelevant.
* Keep the plan **tool-agnostic** but actionable (shell/Python placeholders OK).
### Required document structure (exact section order)
1. **Title** — `# {{PROJECT_NAME}} — \`todo.md\`\`
2. **TL;DR** — one line.
3. **Invariants (do not change)** — non-negotiable constraints.
4. **Assumptions & Scope** — what you’re assuming (label uncertain items).
5. **Objectives** — 3–5 measurable goals.
6. **Risks & Mitigations** — top risks with a single mitigation each.
7. **Method Outline (idea → mechanism → trade-offs → go/no-go)** — turn high-level ideas into actionable variants/workstreams.
8. **Run Matrix** — table of variants with budgets and promotion criteria.
9. **Implementation Notes** — terse details a coder needs (APIs, attach points, precision, cache policies, etc.).
10. **Acceptance Gates** — pass/fail thresholds tied to Objectives.
11. **“Make-sure-you” Checklist** — must-do guardrails for the agent.
12. **File/Layout Plan** — directories and key files to create.
13. **Fenced XML Workflows** — **mandatory**: `building`, `running`, `tracking`, `evaluating`, `refinement`.
* Each `<workflow>` contains ordered `<commands>` and a **`<make_sure>`** checklist.
* Use explicit IDs (e.g., `id="R1"`).
* Commands may be placeholders but must be realistic and sequenced.
14. **Minimal Pseudocode (optional)** — only if it clarifies tricky parts.
15. **Next Actions (strict order)** — 3–6 concrete steps the agent executes next.
### Statistical & evaluation defaults (apply unless the plan dictates otherwise)
* Report paired metrics with **10k bootstrap**, **BCa 95% CIs**; mark significance only if CI lower bound > 0.
* Control family-wise errors (e.g., **FDR** within metric families).
* Maintain **budget parity** (params & FLOPs within **±5%**) across variants unless a “decoding-only” or “systems” budget is declared separately.
* Always show **two slices** if applicable (e.g., “Focused” vs “Full”); never hide slices.
* Include **latency p50/p95**, throughput, and memory/VRAM when performance matters.
* **Verification defaults:** Hermetic spin-up must pass. **Mutation score ≥ 0.80**; **property/metamorphic coverage ≥ 0.70**; **0 high/critical SAST**; **flakiness < 1%** over 100 reruns; **runtime invariants** hold over N=10k shadow requests; **API contracts** green.
### Language & style constraints
* Crisp, technical, neutral tone.
* No self-references (“As an AI…”), no questions.
* Use code fences for XML and pseudocode.
* Use placeholders like `{{MODEL_NAME}}`, `{{DATASET}}`, `{{RANK_SCHEDULE}}` when the plan lacks specifics; **label them** in Assumptions.
---
### OUTPUT TEMPLATE (fill every section)
# {{PROJECT\_NAME}} — `todo.md`
**TL;DR:** {{one-sentence summary of the execution plan}}
## Invariants (do not change)
* {{Constraint 1}}
* {{Constraint 2}}
* {{Oracles are source-of-truth; contracts & properties enforced at runtime}}
* {{Hermetic spin-up required; boot transcript must verify}}
* {{…}}
## Assumptions & Scope
* **Assumption:** {{ explicit assumption }}
* **Assumption:** {{ thresholds if unspecified: T\_mut=0.80, T\_prop=0.70, SAST\_high=0 }}
* **Scope:** {{ what’s in/out }}
## Objectives
1. {{Objective with measurable target}}
2. **Verification:** Achieve mutation ≥ {{T\_mut}} and property/metamorphic coverage ≥ {{T\_prop}}.
3. **Reliability:** Zero high/critical SAST; flakiness < 1% on reruns; all API contracts pass.
4. **Spin-up:** One-shot hermetic boot from clean checkout with golden smoke flows; produce signed boot transcript.
5. {{Domain KPI improvement with CI-backed threshold}}
## Risks & Mitigations
* {{Risk}} → **Mitigation:** {{one-line fix}}
* External API drift → **Mitigation:** Consumer/provider contracts + service virtualization.
* Env non-determinism → **Mitigation:** Containerized toolchain, pinned lockfiles, deterministic seeds.
* Unknown-unknown logic gaps → **Mitigation:** Metamorphic + fuzzing + runtime invariants with shadow traffic.
## Method Outline (idea → mechanism → trade-offs → go/no-go)
### Workstream/Variant A — Spec→Oracles Pipeline
* **Idea:** Compile spec to contracts/properties/metamorphic tests + runtime guards.
* **Mechanism:** DSL → codegen (pre/post/invariants), property-based suites, metamorphic relations; inject assert/monitor hooks.
* **Trade-offs:** Upfront authoring cost; stricter gates surface more refactors.
* **Go/No-Go Gate:** All generated oracles pass; property coverage ≥ {{T\_prop}}.
### Workstream/Variant B — Adversarial Verification
* **Idea:** Kill mutants, explore paths, and stress inputs.
* **Mechanism:** Mutation testing; grammar/coverage-guided fuzzing; concolic on parsers/auth/finance.
* **Trade-offs:** Longer CI wall time; infra complexity.
* **Go/No-Go Gate:** Mutation ≥ {{T\_mut}}; no exploitable paths found at severity ≥ medium.
### Workstream/Variant C — Differential & Contract Testing
* **Idea:** Compare against last known-good/reference; lock external behavior.
* **Mechanism:** Golden snapshots, differential tests, Pact-like contracts validated in CI; API surface diffs.
* **Trade-offs:** Snapshot churn; contract maintenance.
* **Go/No-Go Gate:** No incompatible diffs; contracts green.
## Run Matrix
| ID | Method/Variant | Budget | Inputs | Expected Gain | Promote if… |
| -- | ---------------------------- | ----------------------------- | ------------------------ | ------------------- | -------------------------------------------- |
| V1 | Spec→Oracles | ±5% parity / baseline compute | Spec DSL, current code | Fewer escapes | Property ≥ {{T\_prop}} & all invariants hold |
| V2 | Mutation+Fuzz+Concolic | Separate verification budget | Corpus/grammars | Kill weak tests | Mutation ≥ {{T\_mut}}; 0 high/crit SAST |
| V3 | Differential+Contracts | ±5% parity | Golden outputs, provider | Regression defense | Zero incompatible diffs; contracts green |
| V4 | Runtime Invariant Monitoring | Staging/shadow only | Shadow traffic/replays | Prod-parity signals | 0 invariant breaks over N=10k requests |
## Implementation Notes
* **APIs/Attach points:** {{paths/interfaces for injecting contracts & monitors}}
* **Precision/Quantization:** {{fp16/fp8/int8 policy}}
* **Caching/State:** {{cache windows, stickiness, invalidation}}
* **Telemetry:** Log mutation score, property coverage, fuzz/crash repros, SAST severity counts, API diffs, visual diffs, flakiness, boot transcript hash.
* **Repro:** Seeds, SHAs, container digest, dataset/index hashes, contract versions.
## Acceptance Gates
* **Spin-up:** Clean checkout → container build → migrate/seed → readiness OK → golden smokes pass → **boot transcript signed**.
* **Static:** 0 high/critical SAST; typecheck clean; license policy OK; API surface diffs acknowledged.
* **Dynamic:** Mutation ≥ {{T\_mut}}; property/metamorphic coverage ≥ {{T\_prop}}; fuzzing runtime ≥ {{X}} mins with 0 new medium+ crashes.
* **Differential/Contracts:** No incompatible diffs; contracts green.
* **Runtime (staging/shadow):** 0 invariant breaks over N=10k requests, error budget respected.
* **Domain KPI:** CI lower bound > 0 on {{primary\_metric}} within budget parity.
## “Make-sure-you” Checklist
* Pin toolchain & deps; record env manifest and container digest.
* Generate contracts/properties from spec; commit artifacts.
* Save **boot transcript** and artifact hashes.
* Record seeds; rerun flaky tests 100×; fail on flakiness.
* Quarantine network; stub external deps unless in contract tests.
* Export metrics JSONL; persist logs/artifacts under `artifacts/`.
## File/Layout Plan
```
{{repo_root}}/
spec/ # DSL + compiled contracts/properties
contracts/ # consumer/provider specs
src/
tests/
properties/
metamorphic/
mutation/
differential/
e2e/
scripts/
gatekeeper.py
spinup_smoke.sh
compute_risk.py
artifacts/
boot_transcript.json
metrics/
infra/
analysis/
logs/
```
## Workflows (required)
```xml
<workflows project="{{PROJECT_SLUG}}" version="1.0">
<!-- =============================== -->
<!-- BUILDING: env, assets, guards -->
<!-- =============================== -->
<workflow name="building">
<env id="B0">
<desc>Set up environment and pin versions</desc>
<commands>
<cmd>{{create_venv}}</cmd>
<cmd>{{install_packages}}</cmd>
<cmd>{{container_build_with_lockfiles}}</cmd>
<cmd>{{record_env_manifest}}</cmd>
</commands>
<make_sure>
<item>{{GPU/CPU visibility test}}</item>
<item>{{lockfile / hashes saved}}</item>
</make_sure>
</env>
<assets id="B1">
<desc>Fetch models/data/indexes or domain assets</desc>
<commands>
<cmd>{{download_or_prepare_assets}}</cmd>
<cmd>{{verify_licenses_and_hashes}}</cmd>
</commands>
<make_sure>
<item>{{asset SHAs recorded}}</item>
</make_sure>
</assets>
<contracts id="B2">
<desc>Compile spec to contracts/properties/metamorphic tests</desc>
<commands>
<cmd>{{compile_spec_to_contracts}}</cmd>
<cmd>{{generate_property_tests}}</cmd>
<cmd>{{inject_runtime_guards}}</cmd>
</commands>
<make_sure>
<item>{{oracles generated and versioned}}</item>
</make_sure>
</contracts>
<static id="B3">
<desc>Enable static/semantic guardrails</desc>
<commands>
<cmd>{{run_typecheck_linters}}</cmd>
<cmd>{{run_SAST_taint}}</cmd>
<cmd>{{api_surface_diff}}</cmd>
<cmd>{{complexity_delta_check}}</cmd>
</commands>
<make_sure>
<item>{{abort_on_high/critical findings}}</item>
</make_sure>
</static>
<spinup id="B4">
<desc>Hermetic boot; produce boot transcript</desc>
<commands>
<cmd>{{container_run_clean_checkout}}</cmd>
<cmd>{{apply_migrations_and_seed}}</cmd>
<cmd>{{readiness_and_health_checks}}</cmd>
<cmd>{{run_golden_smokes}}</cmd>
<cmd>{{write_boot_transcript_json}}</cmd>
</commands>
<make_sure>
<item>{{transcript signed with env digest}}</item>
</make_sure>
</spinup>
</workflow>
<!-- =============================== -->
<!-- RUNNING: verification battery -->
<!-- =============================== -->
<workflow name="running">
<baseline id="R0">
<desc>Run baseline under parity</desc>
<commands>
<cmd>{{train_or_build_baseline}}</cmd>
<cmd>{{evaluate_baseline}}</cmd>
</commands>
<make_sure>
<item>{{same attach points / budgets}}</item>
</make_sure>
</baseline>
<contracts id="R1">
<desc>API consumer/provider contracts</desc>
<commands>
<cmd>{{start_service_virtualization}}</cmd>
<cmd>{{run_contract_tests}}</cmd>
</commands>
<make_sure>
<item>{{no contract breaks}}</item>
</make_sure>
</contracts>
<properties id="R2">
<desc>Property & metamorphic tests</desc>
<commands>
<cmd>{{run_property_tests_with_seeds}}</cmd>
<cmd>{{run_metamorphic_suites}}</cmd>
</commands>
<make_sure>
<item>{{report property coverage}}</item>
</make_sure>
</properties>
<fuzz_symbolic id="R3">
<desc>Grammar-guided fuzzing + concolic on critical paths</desc>
<commands>
<cmd>{{fuzz_parsers_and_endpoints}}</cmd>
<cmd>{{concolic_on_auth_and_money}}</cmd>
</commands>
<make_sure>
<item>{{crashes minimized; repros archived}}</item>
</make_sure>
</fuzz_symbolic>
<mutation id="R4">
<desc>Mutation testing for adequacy</desc>
<commands>
<cmd>{{generate_mutants}}</cmd>
<cmd>{{run_mutation_suite}}</cmd>
<cmd>{{compute_mutation_score}}</cmd>
</commands>
<make_sure>
<item>{{mutation score ≥ {{T_mut}}}}</item>
</make_sure>
</mutation>
<differential id="R5">
<desc>Golden snapshots & diffs vs last known-good</desc>
<commands>
<cmd>{{capture_golden_outputs}}</cmd>
<cmd>{{run_differential_tests}}</cmd>
<cmd>{{visual_diff_if_UI}}</cmd>
</commands>
<make_sure>
<item>{{no incompatible diffs}}</item>
</make_sure>
</differential>
<runtime id="R6">
<desc>Staging with shadow traffic; runtime invariants</desc>
<commands>
<cmd>{{deploy_to_staging}}</cmd>
<cmd>{{mirror_or_replay_traffic}}</cmd>
<cmd>{{monitor_invariant_breaks}}</cmd>
</commands>
<make_sure>
<item>{{0 invariant breaks over N requests}}</item>
</make_sure>
</runtime>
</workflow>
<!-- =============================== -->
<!-- TRACKING: collect & compute -->
<!-- =============================== -->
<workflow name="tracking">
<harvest id="T1">
<desc>Consolidate metrics/artifacts; compute statistics</desc>
<commands>
<cmd>{{collect_logs_to_jsonl}}</cmd>
<cmd>{{score_outputs}}</cmd>
<cmd>{{paired_bootstrap_and_FDR}}</cmd>
<cmd>{{summarize_SAST_and_static}}</cmd>
<cmd>{{summarize_mutation_property_coverage}}</cmd>
<cmd>{{summarize_flakiness_visual_diffs}}</cmd>
<cmd>{{hash_and_store_boot_transcript}}</cmd>
</commands>
<make_sure>
<item>{{CI policy applied; stars only when CI>0}}</item>
</make_sure>
</harvest>
<risk id="T2">
<desc>Compute risk score R and decision features</desc>
<commands>
<cmd>{{compute_risk.py --delta_loc --novelty --ext_dep_delta --one_minus_mutation --flakiness --static_severity}}</cmd>
<cmd>{{emit_decision_features_json}}</cmd>
</commands>
<make_sure>
<item>{{features normalized to [0,1]}}</item>
</make_sure>
</risk>
</workflow>
<!-- =============================== -->
<!-- EVALUATING: promotion rules -->
<!-- =============================== -->
<workflow name="evaluating">
<gatekeeper id="E1">
<desc>Apply gates and decide: AGENT_REFINE vs MANUAL_QA vs PROMOTE</desc>
<commands>
<cmd>{{apply_acceptance_gates}}</cmd>
<cmd>{{compute_R_and_compare_thresholds}}</cmd>
<cmd>{{route_decision}}</cmd>
<cmd>{{generate_tables_and_figures}}</cmd>
</commands>
<make_sure>
<item>{{no promotion without CI-backed wins and gates met}}</item>
</make_sure>
</gatekeeper>
</workflow>
<!-- =============================== -->
<!-- REFINEMENT: next iteration -->
<!-- =============================== -->
<workflow name="refinement">
<agent_refine id="N1">
<desc>Auto-iterate with obligation-driven prompts (if routed)</desc>
<commands>
<cmd>{{create_actionable_prompt_from_failures}}</cmd>
<cmd>{{schedule_agent_build_and_verify}}</cmd>
</commands>
<make_sure>
<item>{{prompts include concrete obligations & thresholds}}</item>
</make_sure>
</agent_refine>
<manual_qa id="N2">
<desc>Human exploration handoff (if routed)</desc>
<commands>
<cmd>{{open_tracking_dashboard}}</cmd>
<cmd>{{attach_repros_boot_transcript_contracts}}</cmd>
</commands>
<make_sure>
<item>{{clear owner; rollback/kill-switch ready}}</item>
</make_sure>
</manual_qa>
</workflow>
</workflows>
```
## Minimal Pseudocode (optional)
```python
# Gatekeeper decision (feature weights are configurable)
def decide(metrics, T_mut=0.80, T_prop=0.70, T_manual=0.50):
if not metrics["hermetic_spinup_pass"]: return "AGENT_REFINE: fix spin-up"
if metrics["sast_high_critical"] > 0: return "AGENT_REFINE: resolve SAST"
if metrics["mutation"] < T_mut: return "AGENT_REFINE: raise mutation"
if metrics["prop_cov"] < T_prop: return "AGENT_REFINE: add properties"
R = (w1*metrics["delta_loc"] + w2*metrics["novelty"] +
w3*metrics["ext_dep_delta"] + w4*(1-metrics["mutation"]) +
w5*metrics["flakiness"] + w6*metrics["static_severity"])
if R >= T_manual: return "MANUAL_QA"
return "PROMOTE"
```
## Next Actions (strict order)
1. Normalize plan into Objectives and Acceptance Gates; set T\_mut/T\_prop thresholds.
2. Define or import spec DSL; generate contracts/properties/metamorphic tests.
3. Implement spin-up script and golden smokes; emit boot transcript.
4. Wire mutation/fuzz/concolic and differential test harnesses; add metrics logging.
5. Add Gatekeeper and risk computation; connect to CI promotion step.
---
### HOW TO USE
* Paste your rough plan after this line: `=== PLAN START ===` … `=== PLAN END ===`.
* The model must map plan elements into the template above, filling placeholders, and inventing **only** minimal, labeled assumptions where needed.
* The model must output **only** the final Markdown document (no extra commentary).
**END OF PROMPT**
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment