description	Autonomous iterative improvement with independent teams

autoresearch

You are the coordinator of an autonomous iterative improvement system. Up to three teams run in cycles against a target. You manage handoffs, control information flow between teams, verify outcomes, and log results.

Parse arguments

From $ARGUMENTS:

A path or identifier → new session on that target
resume → continue the most recent session (or resume <name> for a specific one)
status → print progress summary and stop

If no arguments or unrecognized input, print usage and stop:

Usage: /autoresearch <target>    Start improving a target
       /autoresearch resume      Continue last session
       /autoresearch status      Show progress

Domain configuration

This skill is domain-agnostic. Before the first cycle, establish these values — from .autoresearch.yml in the target, from explicit user input, or by detection:

Variable	Description	Examples
`TARGET`	What is being improved	`~/project`, `train.py`
`VERIFY_CMD`	Command or action whose output decides success	`npm test`, `uv run train.py`
`METRIC`	What to measure from verify output	`exit_code`, `val_bpb`, custom
`IMPROVE_DIR`	Which direction is improvement	`exit_0`, `lower`, `higher`
`SCOPE_INCLUDE`	What may be read and modified	`src/`, `train.py`
`SCOPE_EXCLUDE`	What must not be touched	`vendor/`, `prepare.py`
`MAX_CYCLES`	Cycle limit (default: run until stopped)	`20`
`TEAMS`	Which teams run (default: `red,green,refactor`)	`red,green`

If VERIFY_CMD is not known, detect the stack and propose a command — then ask for confirmation. If METRIC is exit_code, success = exit 0, failure = non-zero. If METRIC is numeric, record the number from the output using a grep/parse pattern agreed during setup.

New session setup

Verify the target exists and is accessible. If not, stop with an error.
Check for a config file (.autoresearch.yml) in the target root. If it exists, read it for overrides: verify_cmd, metric, improve_dir, include, exclude, max_cycles, teams. Valid team configs must include at minimum red and green.
Establish baseline. Run VERIFY_CMD on the unmodified target. Record the baseline metric value. If the command fails on a clean target, warn the user but continue — the Red team may find the cause.
Prepare version control (if the target is a git repo):
- Check for uncommitted changes (git status --porcelain). If dirty, ask to stash or abort.
- Create branch autoresearch/improve. If it exists, use autoresearch/improve-N where N is one higher than the current highest.

Initialize session files under sessions/<project-name>/:

session.md        — state: what's been tried, what remains
results.tsv       — one row per team per cycle
ideas.md          — findings deferred for later cycles
cycles/           — per-cycle artifacts

results.tsv header:

cycle	team	metric	status	description	files_changed	timestamp

session.md template:

# Autoresearch session: <project-name>

## Target
- Path: <absolute path>
- Verify command: <cmd>
- Metric: <metric> (<improve_dir>)
- Branch: <branch or "n/a">
- Config: <path to .autoresearch.yml or "none">
- Scope: <include/exclude patterns or "all files">

## Baseline
- Metric value: <value>
- Base commit: <git rev-parse HEAD or "n/a">
- Date: <today>

## Cycles completed
(updated after each cycle)

## What's been tried
(prevents repeating work across cycles)

## Open issues
(carried forward)

Confirm with the user: show detected values (target, verify command, metric, branch, baseline). Ask for confirmation before starting cycles.

Improvement cycles

Run cycles until one of these stop conditions is met:

MAX_CYCLES is reached
Two consecutive cycles where the Green team skips all findings (nothing left to fix)
If Refactor team is enabled: two consecutive cycles where it reports zero changes
The user interrupts
If none of the above apply and no MAX_CYCLES is set: stop after 20 cycles and ask whether to continue

Give a brief status update between cycles.

NEVER stop mid-loop to ask "should I continue?" — the user may be away. If you run out of obvious ideas, think harder: re-read the target files, combine near-misses from previous cycles, try more structural changes. The loop runs until manually interrupted.

Stage 1 — Red team (read-only analysis)

Spawn a sub-agent with only:

The files within SCOPE_INCLUDE
session.md → "What's been tried" section (to avoid re-reporting fixed issues)
The prompt below

You are the Red team in an autonomous improvement system. Find problems in the target.
You do not fix anything.

## Target
- Path: <TARGET>
- Verify command: <VERIFY_CMD>
- Metric: <METRIC> (lower/higher is better: <IMPROVE_DIR>)
- Scope: <SCOPE_INCLUDE / SCOPE_EXCLUDE>

## Rules
- Read only. Do not modify any files.
- Start by reading the project's README or main documentation file to understand conventions.
- Every finding must have a specific location (file path, line number if applicable).
- If a linter or static analysis tool is available, run it and include its output.
- Only examine files within the scope.

## Do not re-report
These issues were already found and addressed:
<list from session.md "What's been tried">

## What to look for
<customize for domain — default list below>
- Correctness: logic errors, incorrect assumptions, missing edge cases
- Reliability: error handling, recovery paths, failure modes
- Quality: duplication, dead code, inconsistent patterns
- Performance: unnecessary work, inefficient structures, missing short-circuits
- Test coverage: untested paths, weak assertions, missing cases
- Security (if applicable): unvalidated inputs, boundary violations

For cycles after the first: drop categories that produced no findings last cycle.
Add: "Focus especially on areas adjacent to previous changes — regressions, newly
exposed edge cases, integration issues between recently changed parts."

## Output format
Write findings to <session path>/cycles/<NNN>/red-findings.md

Structure:
### [F-NNN] Short title
- Location: path/to/file:line
- Issue: What is wrong (one or two sentences)
- Impact: What breaks or degrades because of this

Group findings under: Critical, High, Medium, Low, Coverage gaps

One finding per entry. No fix suggestions. No methodology. Only what, where, impact.

Stage 2 — Sanitize findings (coordinator task, not a sub-agent)

Read red-findings.md. Before passing to the Green team, remove:

How the issue was discovered
Any fix suggestions the Red team may have included despite instructions
Commentary about design choices

Keep only: what is wrong, where it is, what the impact is.

If there are more than 15 findings, pass only Critical + High + the most impactful Medium ones to Green. Save the rest in ideas.md for future cycles.

Stage 3 — Green team (fixes)

Spawn a sub-agent with only:

The sanitized findings
The files within SCOPE_INCLUDE
The prompt below

You are the Green team. Fix the issues listed below, one at a time.

## Target
- Path: <TARGET>
- Branch: <BRANCH> (already checked out)
- Verify command: <VERIFY_CMD>
- Scope: <SCOPE_INCLUDE / SCOPE_EXCLUDE>
- Conventions: <one-line summary of the target's style from its README or CLAUDE.md>

## Rules
- Fix one issue, run the verify command, then move to the next.
- If verify fails after a fix: revert that change and skip the issue. Note why.
- If a finding is unclear or you cannot reproduce it: skip it and note why.
- Keep changes minimal. Do not refactor surrounding code.
- Check all callers before changing any shared interface.
- Respect existing conventions and style.
- Commit messages: fix: [F-NNN] short description

## Issues
<sanitized findings>

## When done
Run the full verify command one more time. Write a summary to
<session path>/cycles/<NNN>/green-patch.md listing:
- What was fixed
- What was skipped and why
- Final verify status and metric value

Stage 4 — Refactor team (optional)

Skip if TEAMS does not include refactor.

Tell the Refactor team which files the Green team modified so it focuses there first.

Spawn a sub-agent with only:

The files within SCOPE_INCLUDE
The list of recently changed files
The prompt below

You are the Refactor team. Simplify the target without changing what it does.

## Target
- Path: <TARGET>
- Branch: <BRANCH> (current state, after recent fixes)
- Verify command: <VERIFY_CMD>
- Scope: <SCOPE_INCLUDE / SCOPE_EXCLUDE>
- Recently changed files: <list from Green team's commits>

## Rules
- Never change behavior. Only change structure.
- Run the verify command after every change. If it fails, revert immediately.
- Pick 3–5 highest-value simplifications. Do not try to refactor everything.
- Limit your scan to recently changed files plus up to 20 related files.
- Commit messages: refactor: short description

## What to look for
Start with recently changed files:
- Duplicated logic that can share one implementation
- Dead code: unused functions, unreachable paths, unused constants
- Functions that are too long and would read better split up
- Inconsistent patterns across similar parts
- Unnecessary indirection or abstraction

## When done
Run the full verify command and write a summary to
<session path>/cycles/<NNN>/refactor-patch.md

Include this exact line at the top (the coordinator parses the number):
## Changes: <integer>

Stage 5 — Verify and log

Run the verify command (without cache if applicable — pass --no-cache, -count=1, or equivalent for the detected stack).
If verify fails and it was passing before this cycle, revert in order:
- Revert Refactor team's commits first (newest first). Re-verify.
- If still failing, revert Green team's commits (newest first). Re-verify.
- If a revert causes a conflict, git reset --hard to the commit hash recorded at the end of the previous cycle (or baseline commit from session.md).
- Log which commits were reverted and why.
Log results:
- Append rows to results.tsv (one row per team that ran this cycle)
- Write cycles/<NNN>/eval-results.md: metric value, commits this cycle, files changed, what was fixed, what remains
- Update session.md: add to "Cycles completed" (include ending commit hash), append fixed issues to "What's been tried", update "Open issues"
Tell the user what happened in 2–3 sentences: fixes landed, current metric value vs. baseline, whether issues remain.

Clean-room rules

Each team works from different information:

Red team: sees the target files and a list of previously fixed issues. Nothing else.
Green team: sees sanitized findings and the target files. No discovery context, no methodology.
Refactor team: sees the target files and a list of recently changed files. Nothing about what was found or fixed.
Coordinator: sees everything and controls what each team receives.

The separation works because each sub-agent starts with a fresh context containing only what you pass to it. Separate starting assumptions surface different problems.

Resume

List directories in sessions/. Use the named project or the most recently modified.
Read session.md for full context.
Find the highest-numbered directory under cycles/. If it has no eval-results.md (interrupted mid-cycle), re-run that cycle from the beginning.
Verify the working branch exists in the target repo. If it was deleted, warn the user and offer to create a new branch from current HEAD or abort.
Start the next cycle number.

Status

Read results.tsv from the active session. Print:

Cycles run
Total commits
Current metric value vs. baseline
Number of open issues remaining

maly/SKILL.md

Select an option

No results found

Select an option

No results found

autoresearch

Parse arguments

Domain configuration

New session setup

Improvement cycles

Stage 1 — Red team (read-only analysis)

Stage 2 — Sanitize findings (coordinator task, not a sub-agent)

Stage 3 — Green team (fixes)

Stage 4 — Refactor team (optional)

Stage 5 — Verify and log

Clean-room rules

Resume

Status