Skip to content

Instantly share code, notes, and snippets.

@MuhammadYossry
Last active May 5, 2026 17:33
Show Gist options
  • Select an option

  • Save MuhammadYossry/6c45fdb08b23ce84c25bfc4b75506f59 to your computer and use it in GitHub Desktop.

Select an option

Save MuhammadYossry/6c45fdb08b23ce84c25bfc4b75506f59 to your computer and use it in GitHub Desktop.
The Minimal Agent Specification (MAS) Draft

The Minimal Agent Specification (MAS)

One File. Few Tokens. Any Agent.

While working on AgentOS I came across this problem: You have 20 specialist agents. Your orchestrator needs to know: Who are they? What can they do? Where's their state?

You don't want to load 20 full personas into memory. You don't want to parse 20,000 tokens of backstory just to route a simple task.

You want a business card. A tiny header file that tells you everything you need before you decide to have a conversation.

That's the Minimal Agent Specification.


What It Is

A single file — .agent — in every agent's directory. Small enough to scan hundreds in seconds. Rich enough to route tasks intelligently.

identity: orion
domain: backend-engineering

capabilities:
  - run_sql
  - http_request
  - read_file

state:
  skills: skills.md
  goals: goals.md

budget:
  max_skills: 20

That's it. The orchestrator reads this file and immediately knows:

  • Who you are (identity)
  • What you're good at (capabilities)
  • Where your memory lives (state pointers)
  • How much you can handle (budget)

Why It Matters

Before MAS: Orchestrator loads every agent's full persona. 10 agents × 2000 tokens = 20,000 tokens before the first task. Slow. Expensive. Fragile.

After MAS: Orchestrator scans .agent files. 10 agents × 125 tokens = 1,250 tokens. Then loads only the agents that match the task.

Task: "Run anomaly detection on April events"

Scan phase: iris has "run_sql" capability → load iris
           orion has "run_sql" → also a candidate
           planner has no matching capability → skip

Load phase: Now load full personas for iris and orion only

The difference is 10x to 100x reduction in context loading.


The Agent Text Size

The MAS is designed for the real world. Every word is a token. No waste.

Component Tokens (approx)
Identity + domain 10
Capabilities (5 items) 35
State pointers (4 items) 35
Budget (4 items) 25
Formatting (spacing, dashes, newlines) 20
Total ~125 tokens

For comparison:

  • A tweet is ~35 tokens
  • A typical email is ~200 tokens
  • A full agent persona is 1,000–3,000 tokens

125 tokens is the sweet spot. Small enough to scan hundreds of agents in one batch. Rich enough to make intelligent routing decisions.


What You Get

For the orchestrator: A registry that requires no database. Just a filesystem and 125 tokens per agent.

For the agent: A stable identity that doesn't change when you learn new skills. Your .agent says who you are. Your skills.md says what you've learned.

For the operator: One file to edit when an agent's domain or capabilities change. No hunting through prompts.

For the system: The ability to discover, validate, and route to agents without loading their full context. This is how you scale from 5 agents to 500.


The One-Page Specification

# .agent — place this file in every agent directory

identity: <unique name>              # required
domain: <area of expertise>          # required

capabilities:                        # list what this agent can do
  - read_file                        # max 10 items
  - run_sql
  - http_request

state:                               # where learning lives
  skills: skills.md                  # required
  goals: goals.md
  rewards: rewards.md

budget:                              # hard limits
  max_skills: 20                     # default
  max_goals: 5                       # default

The Rule

No agent is discoverable without a .agent file.
No orchestrator should load a full persona before reading it.
125 tokens is the contract.

That's it. A business card for every agent. A discovery layer that needs no database. A scale enabler for multi-agent systems.

Your agents are only as useful as your ability to find them. The Minimal Agent Specification makes sure you always can.

@MuhammadYossry
Copy link
Copy Markdown
Author

MuhammadYossry commented May 5, 2026

Agent Bootstrapping Kit: LLM Skills via RL-Inspired Rewards

image

A minimal, file-based framework where an LLM agent earns its own skills, sets its own goals, and prunes its own knowledge — using only a reward signal and hard budget constraints.

No hand-crafted skill files. No operator teaching. Just task outcomes, self-scored rewards, and emergent capability.


Table of Contents


The Core Idea

Most agent systems start with operator-authored skills — static, instructional, and fragile.

This kit inverts that:

  • The agent starts with no skills (only a persona, constraints, and an MAS business card)
  • After each task, the agent self-scores a reward: +1 (success, reusable), 0 (partial), -1 (failure)
  • A skill is only recorded if it has at least one +1 outcome with evidence
  • Skills compete for a fixed budget — low-evidence skills are pruned when space runs out
  • Goals emerge automatically from repeated 0 or -1 outcomes
  • Discovery happens via the Minimal Agent Specification — 125 tokens to know everything needed for routing

What you get: an agent that learns what works for the tasks it actually receives, not what someone guessed upfront.


How It Works (The RL Analogy)

RL Concept Agent Kit Equivalent
Environment The task + file system
Agent policy Skills + goals + reflections
Action Executing a task
Reward Self-scored +1/0/-1 (written to rewards.md)
Value function reward_evidence per skill
Policy update Adding/pruning skills in skills.md
Intrinsic motivation Goals derived from gaps (reflections.md)
Budget constraint Skill cap = policy complexity penalty
Discovery service MAS (.agent file)

No gradients. No backprop. Just file-based experience replay.


File Architecture

agent-bootstrap/
├── system_prompt.md          ← paste into LLM system prompt
├── .agent                    ← MAS business card (125 tokens)
├── agent/
│   ├── persona.md            ← OPERATOR (read-only, full context)
│   ├── constraints.md        ← OPERATOR (read-only)
│   ├── skills.md             ← AGENT (earned skills, pruned)
│   ├── goals.md              ← AGENT (gaps + progress)
│   ├── rewards.md            ← AGENT (rolling log, last 30 entries)
│   ├── reflections.md        ← AGENT (failures + pattern detection)
│   └── skills/               ← AGENT (deep skill files)

New: .agent at the root — the business card that makes discovery possible.

Ownership rule:

  • Operator owns persona.md, constraints.md, .agent (capabilities part)
  • Agent owns learning files
  • Agent cannot modify .agent — your identity is stable even as skills change

The Minimal Agent Specification (MAS)

One File. 125 Tokens. Any Agent.

You have 20 specialist agents. Your orchestrator needs to know: Who are they? What can they do? Where's their state?

You don't want to load 20 full personas into memory. You don't want to parse 20,000 tokens of backstory just to route a simple task.

You want a business card. A tiny header file that tells you everything you need before you decide to have a conversation.

What It Is

A single file — .agent — in every agent's directory. Small enough to scan hundreds in seconds. Rich enough to route tasks intelligently.

# .agent — place this file in EVERY agent directory

identity: orion
domain: backend-engineering

capabilities:
  - run_sql
  - http_request
  - read_file

state:
  skills: agent/skills.md
  goals: agent/goals.md
  rewards: agent/rewards.md

budget:
  max_skills: 20
  max_goals: 5

That's it. The orchestrator reads this file and immediately knows:

  • Who you are (identity)
  • What domain you handle (domain)
  • What you're good at (capabilities)
  • Where your memory lives (state pointers)
  • How much you can handle (budget)

The Token Budget

Component Tokens (approx)
Identity + domain 10
Capabilities (3 items) 25
State pointers (3 items) 30
Budget (2 items) 20
Formatting (spacing, dashes, newlines) 40
Total ~125 tokens

For comparison:

  • A tweet is ~35 tokens
  • A typical email is ~200 tokens
  • A full agent persona is 1,000–3,000 tokens

125 tokens is the sweet spot. Small enough to scan hundreds of agents in one batch. Rich enough to make intelligent routing decisions.

Discovery Workflow

Orchestrator receives: "Run anomaly detection on April events"

SCAN PHASE (reads all .agent files, 125 tokens each):
  iris.agent: capabilities includes "run_sql", "detect_anomalies" → MATCH
  orion.agent: capabilities includes "run_sql" → MATCH (secondary)
  planner.agent: capabilities includes "orchestrate" → NO MATCH
  writer.agent: capabilities includes "write_docs" → NO MATCH

LOAD PHASE (full personas only for matches):
  Load /iris/persona.md (1800 tokens)
  Load /orion/persona.md (2000 tokens)

ROUTE PHASE:
  Primary: iris (domain = analytics)
  Secondary: orion (can assist with SQL if needed)

Total tokens for discovery: 4 agents × 125 = 500 tokens
Total if loading all personas: 10,000+ tokens
Savings: 20x reduction

The MAS Specification (Complete)

# .agent — required file in every agent directory
# Size: ~125 tokens maximum
# Purpose: Discovery and routing, NOT full context

identity: <unique name>              # required, 1-20 chars
domain: <area of expertise>          # required, one of:
                                     # backend, frontend, data, ml, devops, security,
                                     # analytics, research, writing, qa, orchestration

capabilities:                        # max 10 items, verb-noun format
  - read_file                        # examples:
  - run_sql                          # - read_<type>
  - http_request                     # - write_<type>
  - detect_anomalies                 # - run_<tool>
  - orchestrate_agents               # - search_<source>

state:                               # paths relative to agent root
  skills: agent/skills.md            # required — where earned capabilities live
  goals: agent/goals.md              # optional — where gaps are tracked
  rewards: agent/rewards.md          # optional — outcome history

budget:                              # hard limits
  max_skills: 20                     # default if omitted
  max_goals: 5                       # default if omitted

Why This Belongs in Part 0

The bootstrapping kit is about starting from zero. MAS is how agents become discoverable at zero cost. Without MAS, you have isolated learners. With MAS, you have a directory of specialists that orchestration can route to.


Agent Workflow (One Task)

graph TD
    A[Orchestrator reads .agent] --> B{MAS capabilities match task?}
    B -->|Yes| C[Load full persona.md]
    B -->|No| D[Skip agent]
    C --> E[Execute task using current skills]
    E --> F[Self-score reward +1/0/-1]
    F --> G{reward = +1 and reusable?}
    G -->|Yes| H[Add/update skill in skills.md]
    G -->|No| I{reward = 0 or -1?}
    I -->|Yes| J[Write reflection → consider goal]
    H --> K[Append to rewards.md]
    J --> K
    K --> L[Git commit with semantic message]
Loading

Notice: The .agent file is read first, before any expensive loading. This is the key to scaling.


The Reward Signal

Each task appends to rewards.md:

## [2026-04-28 14:32] Query user database with joins
- outcome: success
- reward: +1
- what worked: used parameterized join with explicit indexes
- what failed or was slow: nothing
- skill_update: parameterized-joins
- goal_progress: none

Rules enforced by the agent itself (from constraints.md):

  • Reward scores must be honest — not inflated to avoid pruning
  • A skill without at least one +1 cannot exist in skills.md
  • Partial/failure outcomes may trigger goals but never create skills

Skill Lifecycle: Earn, Update, Prune

Skill Entry Format (in skills.md)

### parameterized-joins
- trigger: any SQL query with user-supplied filters
- approach: build column list dynamically, use execute with params dict
- reward_evidence: +1 on 3 tasks (2026-04-28, 2026-04-27, 2026-04-26)
- last_updated: 2026-04-28
→ see agent/skills/parameterized-joins.md

Pruning Rule (from constraints.md)

Before adding any entry that would exceed the budget, remove the weakest entry first. For skills: remove the one with the lowest reward_evidence score.

Example: Cap = 20 skills. Agent has 20, wants to add a 21st.
It computes evidence score (total +1 outcomes) for each skill, drops the lowest, then adds the new one.

This is the exact analog of policy complexity regularization — weak skills die, strong skills survive.


Goal Emergence

A goal is set when:

  1. A task returns partial or failure
  2. The same gap appears in reflections.md more than once
  3. The agent identifies a capability that would unlock a new task class

Goal Entry Format

### handle-api-rate-limiting
- why: failed 3 tasks due to 429 errors (partial outcomes)
- success_criteria: implement exponential backoff + jitter, verified by 3 consecutive +1 tasks with rate-limited APIs
- status: active
- evidence: rewards.md entries 2026-04-25, 2026-04-26, 2026-04-27

Goals are not aspirational — they are responses to demonstrated gaps.

Active goal cap: 5 (from constraints.md). Abandoned goals are archived.


Governance via Scarcity

File Max entries Pruning rule
skills.md (top-level) 20 Drop lowest reward_evidence
skills/ (deep files) 10 Same
goals.md (active) 5 Drop abandoned or 10+ tasks no activity
rewards.md last 30 entries Prune older
reflections.md last 15 entries Prune older
.agent capabilities 10 Operator controlled — agent cannot change

Why scarcity works:
Without a budget, the agent hoards low-value skills. With a budget, it is forced to keep what actually earns rewards — exactly how RL agents discard low-value actions.


Complete File Templates

.agent (operator-owned, read-only to agent, ~125 tokens)
identity: orion
domain: backend-engineering

capabilities:
  - run_sql
  - http_request
  - read_file

state:
  skills: agent/skills.md
  goals: agent/goals.md
  rewards: agent/rewards.md

budget:
  max_skills: 20
  max_goals: 5
agent/persona.md (operator-owned, read-only)
# Persona

**Name:** Orion
**Domain:** Backend engineering — Python, PostgreSQL, API design
**Tone:** Terse, technical, no preamble. Shows its work.

**Core mandate:** Help engineers ship production-quality backend systems faster. Prioritize correctness and observability over cleverness.

**Not:** A documentation writer, a frontend helper, or a project manager.
agent/constraints.md (operator-owned, read-only)
# Constraints

## File budget

| File | Max entries |
|------|-------------|
| `agent/skills.md` | 20 |
| `agent/skills/` (deep files) | 10 |
| `agent/goals.md` (active goals) | 5 |
| `agent/rewards.md` | Keep last 30 |
| `agent/reflections.md` | Keep last 15 |

**Pruning rule:** Before adding that would exceed budget, remove weakest first.

## Behavioral constraints
- Never claim a skill without ≥1 observed +1 outcome
- Reward scores must be honest
- Never store user data, credentials, or PII
- Never modify `.agent` — your identity is stable

## Immutable files
- `.agent`
- `agent/persona.md`
- `agent/constraints.md`
- `system_prompt.md`
agent/skills.md (agent-owned, starts with hints only)
# Skills

**Current count:** 0 / 20

## Suggested first skills to attempt (hints from operator)
- **Hint:** tasks often involve parsing structured data from APIs with pagination
- **Hint:** common failure modes include missing auth context and rate limiting
- **Hint:** a likely quick win: reusable retry wrapper with exponential backoff

*(Agent earns real skills below via +1 outcomes)*
agent/goals.md (agent-owned, starts empty)
# Goals

**Active goals:** 0 / 5

## How goals get set
- Receive `partial` or `failure` outcome
- Same gap appears in `reflections.md` more than once
- Capability would unlock new task class
agent/rewards.md (agent-owned rolling log)
# Rewards Log

*(One entry per task, pruned to last 30)*

## [2026-04-28 14:32] Example task
- outcome: success
- reward: +1
- what worked: parameterized query with indexes
- what failed: nothing
- skill_update: parameterized-joins
agent/reflections.md (agent-owned failure notes)
# Reflections

*(Pruned to last 15 entries)*

## [2026-04-28] Rate limit failures
- pattern: 3 tasks failed with 429 errors
- hypothesis: no backoff strategy
- adjustment: will implement exponential backoff and set goal
system_prompt.md (paste into LLM system prompt)
# Agent System Prompt

## On startup: read your files
1. `.agent` — your identity and capabilities (READ ONLY)
2. `agent/persona.md`
3. `agent/constraints.md` (READ ONLY)
4. `agent/skills.md`
5. `agent/goals.md`
6. `agent/rewards.md` (last 20 entries)
7. `agent/reflections.md`

**Never modify `.agent`.** Your capabilities are declared by the operator. Your skills are earned by you. The two files serve different purposes.

## After each task: append to rewards.md with reward signal
- outcome: success | partial | failure
- reward: +1 | 0 | -1
- skill_update: <name or "none">
- goal_progress: <goal or "none">

## Update skills.md only if:
- Reward = +1 AND approach is reusable AND non-obvious
- At budget: prune lowest reward_evidence first

## Update goals.md if:
- Pattern of 0/-1 outcomes
- Same gap appears in reflections.md > once

## Hard constraints (from constraints.md)
- Never edit persona.md, constraints.md, or .agent
- Never claim unearned skills
- Reward scores must be honest

Setup in 3 Steps

1. Define the business card (MAS)

  • Edit .agent (125 tokens)
  • Set identity, domain, capabilities, state pointers, budget

2. Define full persona and constraints

  • Edit agent/persona.md (domain, tone, anti-scope)
  • Edit agent/constraints.md (budgets, hard rules)

3. Add 2–3 hints (optional but useful)

  • In agent/skills.md under "Suggested first skills to attempt"
  • These are not skills — they're domain pointers so the agent knows where to look for first wins

4. Paste system_prompt.md into your LLM system prompt field

  • Grant the agent read/write access to the agent/ directory (but NOT to .agent)
  • First task will create missing files

That's it. No further teaching.


What You Don’t Do

Don’t Why
Pre-populate skills.md with real skills Defeats the entire learning mechanism — the agent must earn each skill
Allow agent to edit .agent Capabilities are declared by operator, not self-assigned
Edit rewards.md or reflections.md manually Those are the agent's ground truth; tampering breaks the signal
Raise budgets arbitrarily Scarcity is the governance layer — high budgets = hoarding
Add skills "just in case" The agent discovers what it needs; you don't predict it
Load full personas without checking .agent first Defeats the discovery layer — 10x to 100x token waste

Why This Works

Traditional agent prompt engineering is static curriculum design — you guess what the agent will need and write instructions.

This kit is emergent curriculum via rewards — the agent only keeps what survives contact with real tasks. The budget constraint turns skill maintenance into a competitive optimization problem: weak skills die, strong skills replicate.

The Minimal Agent Specification adds discoverability — orchestrators can find the right agent without loading everything.

The result: an agent system that scales from 1 to 100 agents without linear token growth. Because discovery is constant-time in the number of agents (scan .agent files), and loading is proportional only to matches.


The MAS Rule

No agent is discoverable without a .agent file.
No orchestrator should load a full persona before reading it.
125 tokens is the contract.

That's it. A business card for every agent. A discovery layer that needs no database. A scale enabler for multi-agent systems.

Your agents are only as useful as your ability to find them. The Minimal Agent Specification makes sure you always can.


---

# How MAS Integrates Across All Three Parts

## Part 0 (Bootstrapping Kit) — Discovery Foundation
- `.agent` defines WHAT the agent can do (capabilities)
- `skills.md` defines HOW the agent learned to do it
- Separation of concerns: identity vs. earned capability

## Part 1 (Filesystem) — State Management
- MAS points to state files: `skills: agent/skills.md`
- Orchestrator knows where to find learning without guessing
- Budget declared in MAS enforces governance

## Part 2 (Inter-Agent Communication) — Routing
- Orchestrator scans `.agent` files to find agents with matching capabilities
- Then uses IAC primitives (queues, handoffs, inboxes) to coordinate
- MAS becomes the **routing table** for the IPC layer

## Part 3 (Git Memory) — History
- MAS is tracked in git (operator changes to capabilities are versioned)
- Agent cannot change MAS — identity is immutable in history
- Rollback an agent = restore old `.agent` + associated state

---

## The Unified Discovery Flow

```text
Task arrives: "Analyze April anomalies and generate report"

SCAN: Read all .agent files (125 tokens each)
  iris.agent: capabilities: detect_anomalies, run_sql → MATCH
  orion.agent: capabilities: run_sql → MATCH
  writer.agent: capabilities: write_report → MATCH

ROUTE: Create collaboration
  iris (primary) ← anomaly detection
  orion (secondary) ← SQL support if needed
  writer (tertiary) ← report generation after analysis

LOAD: Full personas only for matched agents (iris, orion, writer)
EXECUTE: Use IAC primitives (handoffs, shared segments)
COMMIT: Git records the collaboration

Total discovery cost: N_agents × 125 tokens
Total load cost: M_matches × persona_size

This is how you build an Agent OS that actually scales.


Final: The One-Page AgentOS Quick Reference

# .agent — 125 tokens. Every agent. No exceptions.

identity: <name>
domain: <backend|data|analytics|ml|ops|security|frontend|qa|writing>

capabilities:        # max 10, verb-noun
  - read_<type>
  - run_<tool>
  - detect_<pattern>

state:               # where learning lives
  skills: agent/skills.md

budget:              # hard limits
  max_skills: 20

The rule: Orchestrators scan first. Load only matches. Scale linearly with tasks, not agents.

A generated design of what AgentOS might look like:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment