MuhammadYossry/agent_mas_draft.md

Last active May 5, 2026 17:33

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/MuhammadYossry/6c45fdb08b23ce84c25bfc4b75506f59.js"></script>
Save MuhammadYossry/6c45fdb08b23ce84c25bfc4b75506f59 to your computer and use it in GitHub Desktop.

Download ZIP

The Minimal Agent Specification (MAS) Draft

Raw

agent_mas_draft.md

The Minimal Agent Specification (MAS)

One File. Few Tokens. Any Agent.

While working on AgentOS I came across this problem: You have 20 specialist agents. Your orchestrator needs to know: Who are they? What can they do? Where's their state?

You don't want to load 20 full personas into memory. You don't want to parse 20,000 tokens of backstory just to route a simple task.

You want a business card. A tiny header file that tells you everything you need before you decide to have a conversation.

That's the Minimal Agent Specification.

What It Is

A single file — .agent — in every agent's directory. Small enough to scan hundreds in seconds. Rich enough to route tasks intelligently.

identity: orion
domain: backend-engineering

capabilities:
  - run_sql
  - http_request
  - read_file

state:
  skills: skills.md
  goals: goals.md

budget:
  max_skills: 20

That's it. The orchestrator reads this file and immediately knows:

Who you are (identity)
What you're good at (capabilities)
Where your memory lives (state pointers)
How much you can handle (budget)

Why It Matters

Before MAS: Orchestrator loads every agent's full persona. 10 agents × 2000 tokens = 20,000 tokens before the first task. Slow. Expensive. Fragile.

After MAS: Orchestrator scans .agent files. 10 agents × 125 tokens = 1,250 tokens. Then loads only the agents that match the task.

Task: "Run anomaly detection on April events"

Scan phase: iris has "run_sql" capability → load iris
           orion has "run_sql" → also a candidate
           planner has no matching capability → skip

Load phase: Now load full personas for iris and orion only

The difference is 10x to 100x reduction in context loading.

The Agent Text Size

The MAS is designed for the real world. Every word is a token. No waste.

Component	Tokens (approx)
Identity + domain	10
Capabilities (5 items)	35
State pointers (4 items)	35
Budget (4 items)	25
Formatting (spacing, dashes, newlines)	20
Total	~125 tokens

For comparison:

A tweet is ~35 tokens
A typical email is ~200 tokens
A full agent persona is 1,000–3,000 tokens

125 tokens is the sweet spot. Small enough to scan hundreds of agents in one batch. Rich enough to make intelligent routing decisions.

What You Get

For the orchestrator: A registry that requires no database. Just a filesystem and 125 tokens per agent.

For the agent: A stable identity that doesn't change when you learn new skills. Your .agent says who you are. Your skills.md says what you've learned.

For the operator: One file to edit when an agent's domain or capabilities change. No hunting through prompts.

For the system: The ability to discover, validate, and route to agents without loading their full context. This is how you scale from 5 agents to 500.

The One-Page Specification

# .agent — place this file in every agent directory

identity: <unique name>              # required
domain: <area of expertise>          # required

capabilities:                        # list what this agent can do
  - read_file                        # max 10 items
  - run_sql
  - http_request

state:                               # where learning lives
  skills: skills.md                  # required
  goals: goals.md
  rewards: rewards.md

budget:                              # hard limits
  max_skills: 20                     # default
  max_goals: 5                       # default

The Rule

No agent is discoverable without a .agent file.
No orchestrator should load a full persona before reading it.
125 tokens is the contract.

That's it. A business card for every agent. A discovery layer that needs no database. A scale enabler for multi-agent systems.

Your agents are only as useful as your ability to find them. The Minimal Agent Specification makes sure you always can.

Author

MuhammadYossry commented May 5, 2026 •

edited

Loading

Agent Bootstrapping Kit: LLM Skills via RL-Inspired Rewards

A minimal, file-based framework where an LLM agent earns its own skills, sets its own goals, and prunes its own knowledge — using only a reward signal and hard budget constraints.

No hand-crafted skill files. No operator teaching. Just task outcomes, self-scored rewards, and emergent capability.

The Core Idea
How It Works (The RL Analogy)
File Architecture
The Minimal Agent Specification (MAS)
Agent Workflow (One Task)
The Reward Signal
Skill Lifecycle: Earn, Update, Prune
Goal Emergence
Governance via Scarcity
Complete File Templates
Setup in 3 Steps
What You Don’t Do

The Core Idea

Most agent systems start with operator-authored skills — static, instructional, and fragile.

This kit inverts that:

The agent starts with no skills (only a persona, constraints, and an MAS business card)
After each task, the agent self-scores a reward: +1 (success, reusable), 0 (partial), -1 (failure)
A skill is only recorded if it has at least one +1 outcome with evidence
Skills compete for a fixed budget — low-evidence skills are pruned when space runs out
Goals emerge automatically from repeated 0 or -1 outcomes
Discovery happens via the Minimal Agent Specification — 125 tokens to know everything needed for routing

What you get: an agent that learns what works for the tasks it actually receives, not what someone guessed upfront.

How It Works (The RL Analogy)

RL Concept	Agent Kit Equivalent
Environment	The task + file system
Agent policy	Skills + goals + reflections
Action	Executing a task
Reward	Self-scored `+1/0/-1` (written to `rewards.md`)
Value function	`reward_evidence` per skill
Policy update	Adding/pruning skills in `skills.md`
Intrinsic motivation	Goals derived from gaps (`reflections.md`)
Budget constraint	Skill cap = policy complexity penalty
Discovery service	MAS (`.agent` file)

No gradients. No backprop. Just file-based experience replay.

File Architecture

agent-bootstrap/
├── system_prompt.md          ← paste into LLM system prompt
├── .agent                    ← MAS business card (125 tokens)
├── agent/
│   ├── persona.md            ← OPERATOR (read-only, full context)
│   ├── constraints.md        ← OPERATOR (read-only)
│   ├── skills.md             ← AGENT (earned skills, pruned)
│   ├── goals.md              ← AGENT (gaps + progress)
│   ├── rewards.md            ← AGENT (rolling log, last 30 entries)
│   ├── reflections.md        ← AGENT (failures + pattern detection)
│   └── skills/               ← AGENT (deep skill files)

New: .agent at the root — the business card that makes discovery possible.

Ownership rule:

Operator owns persona.md, constraints.md, .agent (capabilities part)
Agent owns learning files
Agent cannot modify .agent — your identity is stable even as skills change

The Minimal Agent Specification (MAS)

One File. 125 Tokens. Any Agent.

You have 20 specialist agents. Your orchestrator needs to know: Who are they? What can they do? Where's their state?

You don't want to load 20 full personas into memory. You don't want to parse 20,000 tokens of backstory just to route a simple task.

You want a business card. A tiny header file that tells you everything you need before you decide to have a conversation.

What It Is

A single file — .agent — in every agent's directory. Small enough to scan hundreds in seconds. Rich enough to route tasks intelligently.

# .agent — place this file in EVERY agent directory

identity: orion
domain: backend-engineering

capabilities:
  - run_sql
  - http_request
  - read_file

state:
  skills: agent/skills.md
  goals: agent/goals.md
  rewards: agent/rewards.md

budget:
  max_skills: 20
  max_goals: 5

That's it. The orchestrator reads this file and immediately knows:

Who you are (identity)
What domain you handle (domain)
What you're good at (capabilities)
Where your memory lives (state pointers)
How much you can handle (budget)

The Token Budget

Component	Tokens (approx)
Identity + domain	10
Capabilities (3 items)	25
State pointers (3 items)	30
Budget (2 items)	20
Formatting (spacing, dashes, newlines)	40
Total	~125 tokens

For comparison:

A tweet is ~35 tokens
A typical email is ~200 tokens
A full agent persona is 1,000–3,000 tokens

125 tokens is the sweet spot. Small enough to scan hundreds of agents in one batch. Rich enough to make intelligent routing decisions.

Discovery Workflow

Orchestrator receives: "Run anomaly detection on April events"

SCAN PHASE (reads all .agent files, 125 tokens each):
  iris.agent: capabilities includes "run_sql", "detect_anomalies" → MATCH
  orion.agent: capabilities includes "run_sql" → MATCH (secondary)
  planner.agent: capabilities includes "orchestrate" → NO MATCH
  writer.agent: capabilities includes "write_docs" → NO MATCH

LOAD PHASE (full personas only for matches):
  Load /iris/persona.md (1800 tokens)
  Load /orion/persona.md (2000 tokens)

ROUTE PHASE:
  Primary: iris (domain = analytics)
  Secondary: orion (can assist with SQL if needed)

Total tokens for discovery: 4 agents × 125 = 500 tokens
Total if loading all personas: 10,000+ tokens
Savings: 20x reduction

The MAS Specification (Complete)

# .agent — required file in every agent directory
# Size: ~125 tokens maximum
# Purpose: Discovery and routing, NOT full context

identity: <unique name>              # required, 1-20 chars
domain: <area of expertise>          # required, one of:
                                     # backend, frontend, data, ml, devops, security,
                                     # analytics, research, writing, qa, orchestration

capabilities:                        # max 10 items, verb-noun format
  - read_file                        # examples:
  - run_sql                          # - read_<type>
  - http_request                     # - write_<type>
  - detect_anomalies                 # - run_<tool>
  - orchestrate_agents               # - search_<source>

state:                               # paths relative to agent root
  skills: agent/skills.md            # required — where earned capabilities live
  goals: agent/goals.md              # optional — where gaps are tracked
  rewards: agent/rewards.md          # optional — outcome history

budget:                              # hard limits
  max_skills: 20                     # default if omitted
  max_goals: 5                       # default if omitted

Why This Belongs in Part 0

The bootstrapping kit is about starting from zero. MAS is how agents become discoverable at zero cost. Without MAS, you have isolated learners. With MAS, you have a directory of specialists that orchestration can route to.

Agent Workflow (One Task)

graph TD
    A[Orchestrator reads .agent] --> B{MAS capabilities match task?}
    B -->|Yes| C[Load full persona.md]
    B -->|No| D[Skip agent]
    C --> E[Execute task using current skills]
    E --> F[Self-score reward +1/0/-1]
    F --> G{reward = +1 and reusable?}
    G -->|Yes| H[Add/update skill in skills.md]
    G -->|No| I{reward = 0 or -1?}
    I -->|Yes| J[Write reflection → consider goal]
    H --> K[Append to rewards.md]
    J --> K
    K --> L[Git commit with semantic message]

Notice: The .agent file is read first, before any expensive loading. This is the key to scaling.

The Reward Signal

Each task appends to rewards.md:

## [2026-04-28 14:32] Query user database with joins
- outcome: success
- reward: +1
- what worked: used parameterized join with explicit indexes
- what failed or was slow: nothing
- skill_update: parameterized-joins
- goal_progress: none

Rules enforced by the agent itself (from constraints.md):

Reward scores must be honest — not inflated to avoid pruning
A skill without at least one +1 cannot exist in skills.md
Partial/failure outcomes may trigger goals but never create skills

Skill Lifecycle: Earn, Update, Prune

Skill Entry Format (in `skills.md`)

### parameterized-joins
- trigger: any SQL query with user-supplied filters
- approach: build column list dynamically, use execute with params dict
- reward_evidence: +1 on 3 tasks (2026-04-28, 2026-04-27, 2026-04-26)
- last_updated: 2026-04-28
→ see agent/skills/parameterized-joins.md

Pruning Rule (from `constraints.md`)

Before adding any entry that would exceed the budget, remove the weakest entry first. For skills: remove the one with the lowest reward_evidence score.

Example: Cap = 20 skills. Agent has 20, wants to add a 21st.
It computes evidence score (total +1 outcomes) for each skill, drops the lowest, then adds the new one.

This is the exact analog of policy complexity regularization — weak skills die, strong skills survive.

Goal Emergence

A goal is set when:

A task returns partial or failure
The same gap appears in reflections.md more than once
The agent identifies a capability that would unlock a new task class

Goal Entry Format

### handle-api-rate-limiting
- why: failed 3 tasks due to 429 errors (partial outcomes)
- success_criteria: implement exponential backoff + jitter, verified by 3 consecutive +1 tasks with rate-limited APIs
- status: active
- evidence: rewards.md entries 2026-04-25, 2026-04-26, 2026-04-27

Goals are not aspirational — they are responses to demonstrated gaps.

Active goal cap: 5 (from constraints.md). Abandoned goals are archived.

Governance via Scarcity

File	Max entries	Pruning rule
`skills.md` (top-level)	20	Drop lowest `reward_evidence`
`skills/` (deep files)	10	Same
`goals.md` (active)	5	Drop abandoned or 10+ tasks no activity
`rewards.md`	last 30 entries	Prune older
`reflections.md`	last 15 entries	Prune older
`.agent` capabilities	10	Operator controlled — agent cannot change

Why scarcity works:
Without a budget, the agent hoards low-value skills. With a budget, it is forced to keep what actually earns rewards — exactly how RL agents discard low-value actions.

Complete File Templates

.agent (operator-owned, read-only to agent, ~125 tokens)

identity: orion
domain: backend-engineering

capabilities:
  - run_sql
  - http_request
  - read_file

state:
  skills: agent/skills.md
  goals: agent/goals.md
  rewards: agent/rewards.md

budget:
  max_skills: 20
  max_goals: 5

agent/persona.md (operator-owned, read-only)

# Persona

**Name:** Orion
**Domain:** Backend engineering — Python, PostgreSQL, API design
**Tone:** Terse, technical, no preamble. Shows its work.

**Core mandate:** Help engineers ship production-quality backend systems faster. Prioritize correctness and observability over cleverness.

**Not:** A documentation writer, a frontend helper, or a project manager.

agent/constraints.md (operator-owned, read-only)

# Constraints

## File budget

| File | Max entries |
|------|-------------|
| `agent/skills.md` | 20 |
| `agent/skills/` (deep files) | 10 |
| `agent/goals.md` (active goals) | 5 |
| `agent/rewards.md` | Keep last 30 |
| `agent/reflections.md` | Keep last 15 |

**Pruning rule:** Before adding that would exceed budget, remove weakest first.

## Behavioral constraints
- Never claim a skill without ≥1 observed +1 outcome
- Reward scores must be honest
- Never store user data, credentials, or PII
- Never modify `.agent` — your identity is stable

## Immutable files
- `.agent`
- `agent/persona.md`
- `agent/constraints.md`
- `system_prompt.md`

agent/skills.md (agent-owned, starts with hints only)

# Skills

**Current count:** 0 / 20

## Suggested first skills to attempt (hints from operator)
- **Hint:** tasks often involve parsing structured data from APIs with pagination
- **Hint:** common failure modes include missing auth context and rate limiting
- **Hint:** a likely quick win: reusable retry wrapper with exponential backoff

*(Agent earns real skills below via +1 outcomes)*

agent/goals.md (agent-owned, starts empty)

# Goals

**Active goals:** 0 / 5

## How goals get set
- Receive `partial` or `failure` outcome
- Same gap appears in `reflections.md` more than once
- Capability would unlock new task class

agent/rewards.md (agent-owned rolling log)

# Rewards Log

*(One entry per task, pruned to last 30)*

## [2026-04-28 14:32] Example task
- outcome: success
- reward: +1
- what worked: parameterized query with indexes
- what failed: nothing
- skill_update: parameterized-joins

agent/reflections.md (agent-owned failure notes)

# Reflections

*(Pruned to last 15 entries)*

## [2026-04-28] Rate limit failures
- pattern: 3 tasks failed with 429 errors
- hypothesis: no backoff strategy
- adjustment: will implement exponential backoff and set goal

system_prompt.md (paste into LLM system prompt)

# Agent System Prompt

## On startup: read your files
1. `.agent` — your identity and capabilities (READ ONLY)
2. `agent/persona.md`
3. `agent/constraints.md` (READ ONLY)
4. `agent/skills.md`
5. `agent/goals.md`
6. `agent/rewards.md` (last 20 entries)
7. `agent/reflections.md`

**Never modify `.agent`.** Your capabilities are declared by the operator. Your skills are earned by you. The two files serve different purposes.

## After each task: append to rewards.md with reward signal
- outcome: success | partial | failure
- reward: +1 | 0 | -1
- skill_update: <name or "none">
- goal_progress: <goal or "none">

## Update skills.md only if:
- Reward = +1 AND approach is reusable AND non-obvious
- At budget: prune lowest reward_evidence first

## Update goals.md if:
- Pattern of 0/-1 outcomes
- Same gap appears in reflections.md > once

## Hard constraints (from constraints.md)
- Never edit persona.md, constraints.md, or .agent
- Never claim unearned skills
- Reward scores must be honest

Setup in 3 Steps

1. Define the business card (MAS)

Edit .agent (125 tokens)
Set identity, domain, capabilities, state pointers, budget

2. Define full persona and constraints

Edit agent/persona.md (domain, tone, anti-scope)
Edit agent/constraints.md (budgets, hard rules)

3. Add 2–3 hints (optional but useful)

In agent/skills.md under "Suggested first skills to attempt"
These are not skills — they're domain pointers so the agent knows where to look for first wins

4. Paste `system_prompt.md` into your LLM system prompt field

Grant the agent read/write access to the agent/ directory (but NOT to .agent)
First task will create missing files

That's it. No further teaching.

What You Don’t Do

Don’t	Why
Pre-populate `skills.md` with real skills	Defeats the entire learning mechanism — the agent must earn each skill
Allow agent to edit `.agent`	Capabilities are declared by operator, not self-assigned
Edit `rewards.md` or `reflections.md` manually	Those are the agent's ground truth; tampering breaks the signal
Raise budgets arbitrarily	Scarcity is the governance layer — high budgets = hoarding
Add skills "just in case"	The agent discovers what it needs; you don't predict it
Load full personas without checking `.agent` first	Defeats the discovery layer — 10x to 100x token waste

Why This Works

Traditional agent prompt engineering is static curriculum design — you guess what the agent will need and write instructions.

This kit is emergent curriculum via rewards — the agent only keeps what survives contact with real tasks. The budget constraint turns skill maintenance into a competitive optimization problem: weak skills die, strong skills replicate.

The Minimal Agent Specification adds discoverability — orchestrators can find the right agent without loading everything.

The result: an agent system that scales from 1 to 100 agents without linear token growth. Because discovery is constant-time in the number of agents (scan .agent files), and loading is proportional only to matches.

The MAS Rule

No agent is discoverable without a .agent file.
No orchestrator should load a full persona before reading it.
125 tokens is the contract.

That's it. A business card for every agent. A discovery layer that needs no database. A scale enabler for multi-agent systems.

Your agents are only as useful as your ability to find them. The Minimal Agent Specification makes sure you always can.


---

# How MAS Integrates Across All Three Parts

## Part 0 (Bootstrapping Kit) — Discovery Foundation
- `.agent` defines WHAT the agent can do (capabilities)
- `skills.md` defines HOW the agent learned to do it
- Separation of concerns: identity vs. earned capability

## Part 1 (Filesystem) — State Management
- MAS points to state files: `skills: agent/skills.md`
- Orchestrator knows where to find learning without guessing
- Budget declared in MAS enforces governance

## Part 2 (Inter-Agent Communication) — Routing
- Orchestrator scans `.agent` files to find agents with matching capabilities
- Then uses IAC primitives (queues, handoffs, inboxes) to coordinate
- MAS becomes the **routing table** for the IPC layer

## Part 3 (Git Memory) — History
- MAS is tracked in git (operator changes to capabilities are versioned)
- Agent cannot change MAS — identity is immutable in history
- Rollback an agent = restore old `.agent` + associated state

---

## The Unified Discovery Flow

```text
Task arrives: "Analyze April anomalies and generate report"

SCAN: Read all .agent files (125 tokens each)
  iris.agent: capabilities: detect_anomalies, run_sql → MATCH
  orion.agent: capabilities: run_sql → MATCH
  writer.agent: capabilities: write_report → MATCH

ROUTE: Create collaboration
  iris (primary) ← anomaly detection
  orion (secondary) ← SQL support if needed
  writer (tertiary) ← report generation after analysis

LOAD: Full personas only for matched agents (iris, orion, writer)
EXECUTE: Use IAC primitives (handoffs, shared segments)
COMMIT: Git records the collaboration

Total discovery cost: N_agents × 125 tokens
Total load cost: M_matches × persona_size

This is how you build an Agent OS that actually scales.

Final: The One-Page AgentOS Quick Reference

# .agent — 125 tokens. Every agent. No exceptions.

identity: <name>
domain: <backend|data|analytics|ml|ops|security|frontend|qa|writing>

capabilities:        # max 10, verb-noun
  - read_<type>
  - run_<tool>
  - detect_<pattern>

state:               # where learning lives
  skills: agent/skills.md

budget:              # hard limits
  max_skills: 20

The rule: Orchestrators scan first. Load only matches. Scale linearly with tasks, not agents.

A generated design of what AgentOS might look like:

MuhammadYossry/agent_mas_draft.md

Select an option

No results found

Select an option

No results found

The Minimal Agent Specification (MAS)

One File. Few Tokens. Any Agent.

What It Is

Why It Matters

The Agent Text Size

What You Get

The One-Page Specification

The Rule

MuhammadYossry commented May 5, 2026 •

edited

Loading

Uh oh!

MuhammadYossry/agent_mas_draft.md

The Minimal Agent Specification (MAS)

One File. Few Tokens. Any Agent.

What It Is

Why It Matters

The Agent Text Size

What You Get

The One-Page Specification

The Rule

MuhammadYossry commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Agent Bootstrapping Kit: LLM Skills via RL-Inspired Rewards

Table of Contents

The Core Idea

How It Works (The RL Analogy)

File Architecture

The Minimal Agent Specification (MAS)

One File. 125 Tokens. Any Agent.

What It Is

The Token Budget

Discovery Workflow

The MAS Specification (Complete)

Why This Belongs in Part 0

Agent Workflow (One Task)

The Reward Signal

Skill Lifecycle: Earn, Update, Prune

Skill Entry Format (in skills.md)

Pruning Rule (from constraints.md)

Goal Emergence

Goal Entry Format

Governance via Scarcity

Complete File Templates

Setup in 3 Steps

1. Define the business card (MAS)

2. Define full persona and constraints

3. Add 2–3 hints (optional but useful)

4. Paste system_prompt.md into your LLM system prompt field

What You Don’t Do

Why This Works

The MAS Rule

Final: The One-Page AgentOS Quick Reference

Uh oh!

MuhammadYossry commented May 5, 2026 •

edited

Loading

Skill Entry Format (in `skills.md`)

Pruning Rule (from `constraints.md`)

4. Paste `system_prompt.md` into your LLM system prompt field