While working on AgentOS I came across this problem: You have 20 specialist agents. Your orchestrator needs to know: Who are they? What can they do? Where's their state?
You don't want to load 20 full personas into memory. You don't want to parse 20,000 tokens of backstory just to route a simple task.
You want a business card. A tiny header file that tells you everything you need before you decide to have a conversation.
That's the Minimal Agent Specification.
A single file — .agent — in every agent's directory. Small enough to scan hundreds in seconds. Rich enough to route tasks intelligently.
identity: orion
domain: backend-engineering
capabilities:
- run_sql
- http_request
- read_file
state:
skills: skills.md
goals: goals.md
budget:
max_skills: 20That's it. The orchestrator reads this file and immediately knows:
- Who you are (identity)
- What you're good at (capabilities)
- Where your memory lives (state pointers)
- How much you can handle (budget)
Before MAS: Orchestrator loads every agent's full persona. 10 agents × 2000 tokens = 20,000 tokens before the first task. Slow. Expensive. Fragile.
After MAS: Orchestrator scans .agent files. 10 agents × 125 tokens = 1,250 tokens. Then loads only the agents that match the task.
Task: "Run anomaly detection on April events"
Scan phase: iris has "run_sql" capability → load iris
orion has "run_sql" → also a candidate
planner has no matching capability → skip
Load phase: Now load full personas for iris and orion only
The difference is 10x to 100x reduction in context loading.
The MAS is designed for the real world. Every word is a token. No waste.
| Component | Tokens (approx) |
|---|---|
| Identity + domain | 10 |
| Capabilities (5 items) | 35 |
| State pointers (4 items) | 35 |
| Budget (4 items) | 25 |
| Formatting (spacing, dashes, newlines) | 20 |
| Total | ~125 tokens |
For comparison:
- A tweet is ~35 tokens
- A typical email is ~200 tokens
- A full agent persona is 1,000–3,000 tokens
125 tokens is the sweet spot. Small enough to scan hundreds of agents in one batch. Rich enough to make intelligent routing decisions.
For the orchestrator: A registry that requires no database. Just a filesystem and 125 tokens per agent.
For the agent: A stable identity that doesn't change when you learn new skills. Your .agent says who you are. Your skills.md says what you've learned.
For the operator: One file to edit when an agent's domain or capabilities change. No hunting through prompts.
For the system: The ability to discover, validate, and route to agents without loading their full context. This is how you scale from 5 agents to 500.
# .agent — place this file in every agent directory
identity: <unique name> # required
domain: <area of expertise> # required
capabilities: # list what this agent can do
- read_file # max 10 items
- run_sql
- http_request
state: # where learning lives
skills: skills.md # required
goals: goals.md
rewards: rewards.md
budget: # hard limits
max_skills: 20 # default
max_goals: 5 # defaultNo agent is discoverable without a
.agentfile.
No orchestrator should load a full persona before reading it.
125 tokens is the contract.
That's it. A business card for every agent. A discovery layer that needs no database. A scale enabler for multi-agent systems.
Your agents are only as useful as your ability to find them. The Minimal Agent Specification makes sure you always can.
Agent Bootstrapping Kit: LLM Skills via RL-Inspired Rewards
A minimal, file-based framework where an LLM agent earns its own skills, sets its own goals, and prunes its own knowledge — using only a reward signal and hard budget constraints.
Table of Contents
The Core Idea
Most agent systems start with operator-authored skills — static, instructional, and fragile.
This kit inverts that:
+1(success, reusable),0(partial),-1(failure)0or-1outcomesWhat you get: an agent that learns what works for the tasks it actually receives, not what someone guessed upfront.
How It Works (The RL Analogy)
+1/0/-1(written torewards.md)reward_evidenceper skillskills.mdreflections.md).agentfile)No gradients. No backprop. Just file-based experience replay.
File Architecture
New:
.agentat the root — the business card that makes discovery possible.Ownership rule:
persona.md,constraints.md,.agent(capabilities part).agent— your identity is stable even as skills changeThe Minimal Agent Specification (MAS)
One File. 125 Tokens. Any Agent.
You have 20 specialist agents. Your orchestrator needs to know: Who are they? What can they do? Where's their state?
You don't want to load 20 full personas into memory. You don't want to parse 20,000 tokens of backstory just to route a simple task.
You want a business card. A tiny header file that tells you everything you need before you decide to have a conversation.
What It Is
A single file —
.agent— in every agent's directory. Small enough to scan hundreds in seconds. Rich enough to route tasks intelligently.That's it. The orchestrator reads this file and immediately knows:
The Token Budget
For comparison:
125 tokens is the sweet spot. Small enough to scan hundreds of agents in one batch. Rich enough to make intelligent routing decisions.
Discovery Workflow
The MAS Specification (Complete)
Why This Belongs in Part 0
The bootstrapping kit is about starting from zero. MAS is how agents become discoverable at zero cost. Without MAS, you have isolated learners. With MAS, you have a directory of specialists that orchestration can route to.
Agent Workflow (One Task)
graph TD A[Orchestrator reads .agent] --> B{MAS capabilities match task?} B -->|Yes| C[Load full persona.md] B -->|No| D[Skip agent] C --> E[Execute task using current skills] E --> F[Self-score reward +1/0/-1] F --> G{reward = +1 and reusable?} G -->|Yes| H[Add/update skill in skills.md] G -->|No| I{reward = 0 or -1?} I -->|Yes| J[Write reflection → consider goal] H --> K[Append to rewards.md] J --> K K --> L[Git commit with semantic message]Notice: The
.agentfile is read first, before any expensive loading. This is the key to scaling.The Reward Signal
Each task appends to
rewards.md:Rules enforced by the agent itself (from constraints.md):
+1cannot exist inskills.mdSkill Lifecycle: Earn, Update, Prune
Skill Entry Format (in
skills.md)Pruning Rule (from
constraints.md)Example: Cap = 20 skills. Agent has 20, wants to add a 21st.
It computes evidence score (total +1 outcomes) for each skill, drops the lowest, then adds the new one.
This is the exact analog of policy complexity regularization — weak skills die, strong skills survive.
Goal Emergence
A goal is set when:
partialorfailurereflections.mdmore than onceGoal Entry Format
Goals are not aspirational — they are responses to demonstrated gaps.
Active goal cap: 5 (from
constraints.md). Abandoned goals are archived.Governance via Scarcity
skills.md(top-level)reward_evidenceskills/(deep files)goals.md(active)rewards.mdreflections.md.agentcapabilitiesWhy scarcity works:
Without a budget, the agent hoards low-value skills. With a budget, it is forced to keep what actually earns rewards — exactly how RL agents discard low-value actions.
Complete File Templates
.agent(operator-owned, read-only to agent, ~125 tokens)agent/persona.md(operator-owned, read-only)agent/constraints.md(operator-owned, read-only)agent/skills.md(agent-owned, starts with hints only)agent/goals.md(agent-owned, starts empty)agent/rewards.md(agent-owned rolling log)agent/reflections.md(agent-owned failure notes)system_prompt.md(paste into LLM system prompt)Setup in 3 Steps
1. Define the business card (MAS)
.agent(125 tokens)2. Define full persona and constraints
agent/persona.md(domain, tone, anti-scope)agent/constraints.md(budgets, hard rules)3. Add 2–3 hints (optional but useful)
agent/skills.mdunder "Suggested first skills to attempt"4. Paste
system_prompt.mdinto your LLM system prompt fieldagent/directory (but NOT to.agent)That's it. No further teaching.
What You Don’t Do
skills.mdwith real skills.agentrewards.mdorreflections.mdmanually.agentfirstWhy This Works
Traditional agent prompt engineering is static curriculum design — you guess what the agent will need and write instructions.
This kit is emergent curriculum via rewards — the agent only keeps what survives contact with real tasks. The budget constraint turns skill maintenance into a competitive optimization problem: weak skills die, strong skills replicate.
The Minimal Agent Specification adds discoverability — orchestrators can find the right agent without loading everything.
The result: an agent system that scales from 1 to 100 agents without linear token growth. Because discovery is constant-time in the number of agents (scan
.agentfiles), and loading is proportional only to matches.The MAS Rule
That's it. A business card for every agent. A discovery layer that needs no database. A scale enabler for multi-agent systems.
Your agents are only as useful as your ability to find them. The Minimal Agent Specification makes sure you always can.
This is how you build an Agent OS that actually scales.
Final: The One-Page AgentOS Quick Reference
The rule: Orchestrators scan first. Load only matches. Scale linearly with tasks, not agents.
A generated design of what AgentOS might look like:
