- Draft version
"Make it reproducible, or it never happened."
Prerequisites:
- Part 0: Agent Bootstrapping Kit
- Part 1: Agent Filesystem
- Part 2: Inter‑Agent Communication
- Part 3: Git as Agent Memory
- Introduction
- Why Docker Matters for AgentOS
- Prerequisites
- Project Layout: The Files You'll Own
- Step 1 — Define Your Agent(s) in
agents.yaml - Step 2 — Write the Dockerfile
- Step 3 — Build the Image Locally
- Step 4 — Prepare Your Data Directory
- Step 5 — Run Your Agent Container
- Step 6 — Feed Tasks and Watch Your Agent Learn
- Model‑Agnostic Design
- Customization Patterns
- Observing Your Agent's Growth
- Next Steps
- Complete Examples
- Closing Thoughts
In Parts 0–3 we built a complete blueprint for an agent that earns its skills, prunes its memory, and records its own history — all through plain files and git commits. But a blueprint is only half the story. To run an agent reliably, across machines, without hidden dependencies or "works on my laptop" surprises, we need a container.
This article, Part 4, turns the blueprint into a concrete, runnable system. By the end you'll have:
- A Dockerfile that packages the AgentOS runtime — model‑agnostic and vendor‑neutral.
- A declarative way to define your own agents (personas, domains, goals, LLM provider).
- A local image build workflow you control — no external registry required.
- A single command to launch an agent that learns from real tasks, with proven end‑to‑end flow.
- Three complete, diverse examples: a code reviewer, a legal‑document analyst, and a creative writing coach.
No vendor lock‑in. No assumptions about which LLM you use. Just files, Python, git, and a Dockerfile.
| Concern | Without Docker | With Docker |
|---|---|---|
| Reproducibility | "But it worked in my venv…" | Image built from a locked‑in definition |
| Isolation | Agents share host filesystem, tools, secrets | Each container has its own filesystem and network |
| Portability | pip, Python version, OS quirks | docker run anywhere |
| Observability | Logs in disparate places | docker logs, mounted volumes |
| LLM provider swap | Rewrite integration code per machine | Provider config in environment variables — image unchanged |
AgentOS already treats agent state as a filesystem contract. Docker adds an execution contract — the runtime environment is as deterministic as the state layout.
- Docker installed (≥ 20.10 recommended). Verify with
docker --version. - An LLM API key — any provider. AgentOS works with OpenAI, Anthropic, local models via Ollama, or any HTTP‑accessible LLM.
- git installed on your host (used to inspect agent history from outside the container, optional but useful).
- The source files listed below.
Create a project directory. You'll place six files inside:
my-agent-project/
├── Dockerfile
├── agents.yaml # ← you define your agents here
├── agent.py # model‑agnostic agent runtime
├── bootstrap.py # filesystem initializer (idempotent)
├── main.py # container entrypoint
└── requirements.txt # httpx, pyyaml
The four Python files implement the entire runtime. Their full source was shown in the previous section — here's a quick summary of what each does:
| File | Purpose |
|---|---|
requirements.txt |
Only two dependencies: httpx>=0.28.1 and pyyaml>=6.0.3. No vendor SDKs. |
bootstrap.py |
Creates the standard file tree (persona.md, constraints.md, skills.md, goals.md, rewards.md, reflections.md, queue.md, system_prompt.md) and initializes a git repo. Idempotent — safe to call on every start. |
agent.py |
Reads tasks from queue.md, builds prompts from agent state files, calls any OpenAI‑compatible endpoint via httpx, parses structured responses, updates skills/goals/rewards, commits to git. |
main.py |
Entrypoint. Reads agents.yaml, bootstraps each agent, then loops forever checking each agent's queue for new tasks. |
AgentOS discovers agents through a single YAML file mounted into the container. Every agent you want to run gets an entry here.
agents.yaml:
agents:
- name: orion
persona: "Backend engineer specializing in Python, PostgreSQL, and API design. Terse, technical, shows its work. Prioritizes correctness over cleverness."
domain: "Backend engineering"
tone: "Terse, technical, no preamble."
hints:
- "Tasks often involve parsing structured API data with pagination."
- "Common failure: missing auth context or rate limiting."
- "Quick win: reusable retry wrapper with exponential backoff."
goals:
- "Handle streaming SSE endpoints robustly"
- "Build a query‑parameterisation skill for all SQL tasks"What each field does:
| Field | Purpose | Example |
|---|---|---|
name |
Unique agent identifier (used for its subdirectory) | orion |
persona |
Identity and mandate, written into persona.md |
Backend engineer… |
domain |
Area of expertise | Backend engineering |
tone |
Communication style | Terse, technical |
hints |
Seed pointers — not skills, just areas to explore | Reusable retry wrapper… |
goals |
Optional; seed goals. Real goals emerge from failures. | Handle streaming SSE endpoints |
Provider configuration goes in environment variables, not in YAML. This keeps the agent definition portable across providers.
Dockerfile:
FROM python:3.12-slim
LABEL org.opencontainers.image.title="AgentOS"
LABEL org.opencontainers.image.description="Self-improving LLM agent runtime — model‑agnostic"
# ── System dependencies ─────────────────────────────────────────
RUN apt-get update && \
apt-get install -y --no-install-recommends git && \
rm -rf /var/lib/apt/lists/*
# ── Create non‑root user ─────────────────────────────────────────
RUN useradd --create-home --shell /bin/bash agentos
# ── Application directory ────────────────────────────────────────
WORKDIR /app
# ── Python dependencies ──────────────────────────────────────────
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# ── Copy runtime files ───────────────────────────────────────────
COPY main.py bootstrap.py agent.py ./
# ── Data volume mount point ──────────────────────────────────────
RUN mkdir -p /data && chown -R agentos:agentos /data /app
VOLUME /data
# ── Runtime configuration ────────────────────────────────────────
USER agentos
ENV DATA_DIR=/data
ENV POLL_INTERVAL=5
# LLM defaults (override at runtime) — OpenAI‑compatible
ENV LLM_BASE_URL=https://api.openai.com/v1
ENV LLM_MODEL=gpt-4o
ENV LLM_MAX_TOKENS=2048
ENV LLM_TIMEOUT=60
ENTRYPOINT ["python", "-u", "main.py"]You build the image yourself — no remote registry needed. This keeps you in full control.
# From your project directory (where the Dockerfile lives)
cd my-agent-project
docker build -t agentos:latest .First build takes 30–60 seconds. Subsequent builds are fast — Docker caches layers unless you change the Python source files.
Verify the image exists:
docker images agentosOutput:
REPOSITORY TAG IMAGE ID CREATED SIZE
agentos latest abc123def456 2 minutes ago 180MB
AgentOS stores all persistent state in a host directory mounted at /data. Create it before running:
mkdir -p my-agent-data
cp agents.yaml my-agent-data/The directory must contain agents.yaml at its root — the runtime reads this on startup.
What goes where:
my-agent-data/ ← mounted to /data in container
├── agents.yaml ← agent definitions (required)
└── agents/ ← created automatically by bootstrap
└── orion/ ← one subdirectory per agent
├── persona.md
├── constraints.md
├── skills.md
├── goals.md
├── rewards.md
├── reflections.md
├── queue.md
├── system_prompt.md
├── skills/
├── iac/
└── .git/ ← agent's git history
The agent subdirectories are created automatically on first run — you only need to provide agents.yaml.
Supply your LLM API key and mount the data directory. The provider is configured via environment variables.
With OpenAI:
docker run -d \
--name agentos-orion \
-e LLM_API_KEY="sk-..." \
-e LLM_BASE_URL="https://api.openai.com/v1" \
-e LLM_MODEL="gpt-4o" \
-v $(pwd)/my-agent-data:/data \
agentos:latestWith Anthropic (via compatible endpoint):
docker run -d \
--name agentos-orion \
-e LLM_API_KEY="sk-ant-..." \
-e LLM_BASE_URL="https://api.anthropic.com/v1" \
-e LLM_MODEL="claude-sonnet-4-20250514" \
-v $(pwd)/my-agent-data:/data \
agentos:latestWith Ollama (local, no cloud costs):
docker run -d \
--name agentos-local \
-e LLM_API_KEY="ollama" \
-e LLM_BASE_URL="http://host.docker.internal:11434/v1" \
-e LLM_MODEL="llama3.1:8b" \
-e LLM_MAX_TOKENS=4096 \
-v $(pwd)/my-agent-data:/data \
agentos:latestWith Groq:
docker run -d \
--name agentos-groq \
-e LLM_API_KEY="gsk_..." \
-e LLM_BASE_URL="https://api.groq.com/openai/v1" \
-e LLM_MODEL="llama-3.1-70b-versatile" \
-v $(pwd)/my-agent-data:/data \
agentos:latestVerify it started:
docker ps --filter name=agentos
docker logs agentos-orionYou should see:
2026-04-29T10:00:00 INFO [bootstrap] Initialised git repo at /data/agents/orion
2026-04-29T10:00:00 INFO [agentos] Agent 'orion' ready at /data/agents/orion
2026-04-29T10:00:00 INFO [agentos] AgentOS running with 1 agent(s). Polling every 5s.
Tasks arrive by writing a task block into the agent's queue.md file. The agent polls every few seconds, picks up the first task, processes it, and removes it from the queue.
Write a task block into the queue file in your mounted data directory:
cat >> my-agent-data/agents/orion/queue.md << 'EOF'
## first-task
task: List your active constraints and explain how the reward system works.
priority: normal
created: 2026-04-29T10:05:00Z
EOFImportant: The leading blank line before ## first-task ensures the block is properly separated from any existing content.
Each task block in queue.md must follow this structure:
## <unique-task-id>
task: <description of what the agent should do>
priority: normal|high|low
created: <ISO 8601 timestamp>The ## header marks the start of a task block. The runtime splits the file on ## headers and processes the first complete block it finds. After processing, it removes the block from the file.
docker logs -f agentos-orionYou'll see the agent spring into action:
2026-04-29T10:05:05 INFO [agent] [orion] Processing task: first-task
2026-04-29T10:05:12 INFO [agent] [orion] Response:
The active constraints are: skills.md capped at 20 entries,
goals.md capped at 5 active goals, rewards.md keeps last 30 entries,
and reflections.md keeps last 15 entries. The reward system uses
+1 for successful reusable outcomes, 0 for partial, -1 for failures.
Skills are only earned from +1 outcomes...
After processing, inspect the files:
# Check the reward was recorded
cat my-agent-data/agents/orion/rewards.md
# Check if a skill was earned (if the task was a +1)
cat my-agent-data/agents/orion/skills.md
# View the agent's git history
git -C my-agent-data/agents/orion log --onelineExample git log:
* a3f9c12 orion(task): first-task
* 8b2e01a init: agent filesystem
The agent grows through repeated task cycles. Queue several tasks to watch it build skills:
cat >> my-agent-data/agents/orion/queue.md << 'EOF'
## paginated-api
task: Write a Python function that fetches all pages from a paginated REST API endpoint. The API returns a 'next' field in the response body. Handle rate limits gracefully.
priority: high
created: 2026-04-29T10:10:00Z
EOF
cat >> my-agent-data/agents/orion/queue.md << 'EOF'
## sql-parameterization
task: Write a function that builds a parameterized SQL SELECT query from a table name, a list of column names, and a dict of WHERE conditions. Prevent SQL injection.
priority: high
created: 2026-04-29T10:15:00Z
EOFAfter each task completes, check the agent's growing skill library:
cat my-agent-data/agents/orion/skills.mdOver multiple tasks, you'll see skills accumulate (and old ones pruned when the budget fills) — the agent evolving exactly as designed in Part 0.
Stop the container:
docker stop agentos-orionAll state is on the host in my-agent-data/. Restart with the same volume mount:
docker start agentos-orionThe agent resumes polling its queue. Any tasks you wrote to queue.md while stopped will be picked up immediately.
When you modify the Python source files, rebuild and restart:
docker build -t agentos:v2 .
docker stop agentos-orion
docker rm agentos-orion
docker run -d \
--name agentos-orion \
-e LLM_API_KEY="sk-..." \
-v $(pwd)/my-agent-data:/data \
agentos:v2Your agent's skills, goals, rewards, and git history survive because they live on the mounted volume — not inside the container.
The runtime makes zero assumptions about which LLM you use. The contract is simple:
- Endpoint: Any URL that accepts
POST /chat/completionswith OpenAI‑compatible JSON. - Auth:
Bearer <token>in theAuthorizationheader. - Response: Standard
{"choices": [{"message": {"content": "..."}}]}format.
This covers: OpenAI, Anthropic (via adapter), Ollama, Groq, Together AI, Fireworks, DeepInfra, vLLM, llama.cpp, LiteLLM proxy, and any self‑hosted model behind an OpenAI‑compatible wrapper.
To switch providers, change three environment variables. The image and config stay the same.
agents:
- name: iris
persona: "Data analyst specialized in anomaly detection on time‑series data. Communicates with concise summaries."
domain: "Data analytics"
tone: "Query‑oriented, evidence‑first"
hints:
- "Most tasks involve SQL with outlier detection."
- "Always validate data completeness before analysis."
goals:
- "Automatically validate data completeness before analysis"agents:
- name: orion
persona: ...
domain: "Backend engineering"
hints: [...]
- name: iris
persona: ...
domain: "Data analytics"
hints: [...]
- name: nova
persona: ...
domain: "Technical writing"
hints: [...]All share the container but have independent file trees, git histories, and queues. Coordination via files (handoffs, shared segments) is covered in Part 2.
The /data volume is your agent's long‑term memory. Build a new image locally and restart:
docker build -t agentos:v2 .
docker stop agentos-v1 && docker rm agentos-v1
docker run -d --name agentos-v2 -v $(pwd)/my-agent-data:/data agentos:v2The agent wakes up with all earned skills, reward logs, and git history intact.
Everything is plain files and git. No special tools required.
# Current skills
docker exec agentos-orion cat /data/agents/orion/skills.md
# Reward history
docker exec agentos-orion cat /data/agents/orion/rewards.md
# Learning timeline (from host)
git -C my-agent-data/agents/orion log --oneline --graph
# Live activity stream
docker logs -f agentos-orion
# Check queue for pending tasks
cat my-agent-data/agents/orion/queue.md
# Count skills earned
docker exec agentos-orion grep -c "^### " /data/agents/orion/skills.md
# See the last 5 reward outcomes
docker exec agentos-orion grep "^-\s*reward:" /data/agents/orion/rewards.md | tail -5- Add a REST API to push tasks programmatically instead of writing to queue files.
- Implement IAC primitives (Part 2) — handoff files, shared segments, queues for multi‑agent collaboration.
- Scale with Docker Compose — per‑agent containers, shared network, optional shared git remote.
- Harden for production — healthchecks, resource limits, read‑only root filesystem, secret management.
- Add a watchdog — script that monitors
rewards.mdand alerts on sustained-1streaks.
Goal: An agent that reviews code snippets, identifies bugs and anti‑patterns, and builds a library of recurring issues with detection heuristics.
agents.yaml:
agents:
- name: revu
persona: "Senior code reviewer. Knows Python, TypeScript, and Go. Identifies logic errors, security vulnerabilities, and style violations. Cites specific lines. Suggests concrete fixes with before/after diffs. Never approves without evidence."
domain: "Code review"
tone: "Direct, evidence‑based, line‑specific"
hints:
- "Common issues: missing input validation, race conditions, SQL injection vectors, resource leaks."
- "Suggestions should include before/after code blocks."
- "Track recurring anti‑patterns across reviews."
goals:
- "Build a library of recurring anti‑patterns with detection heuristics"
- "Learn framework‑specific security pitfalls (Django, Express, Gin)"Run (Ollama must be running locally):
mkdir -p revu-data
cp agents.yaml revu-data/
docker run -d \
--name revu \
-e LLM_API_KEY="ollama" \
-e LLM_BASE_URL="http://host.docker.internal:11434/v1" \
-e LLM_MODEL="llama3.1:8b" \
-e LLM_MAX_TOKENS=4096 \
-v $(pwd)/revu-data:/data \
agentos:latestFeed a review task:
cat >> revu-data/agents/revu/queue.md << 'EOF'
## review-flask-endpoint
task: |
Review the following Python code for security issues and bugs.
```python
@app.route('/user/<username>')
def get_user(username):
query = f"SELECT * FROM users WHERE name = '{username}'"
result = db.execute(query)
return jsonify(result.fetchone())Provide specific line-by-line findings with fixes. priority: high created: 2026-04-29T10:30:00Z EOF
**Watch it learn:**
```bash
docker logs -f revu
Goal: An agent that reviews contract clauses, flags risky language, and builds a knowledge base of problematic patterns across documents.
agents.yaml:
agents:
- name: lex
persona: "Contract analyst specializing in software licensing agreements, NDAs, and service level agreements. Identifies ambiguous language, missing clauses, and unfavorable terms. Cites specific sections. Suggests neutral alternative language. Conservative by default — flags anything uncertain."
domain: "Legal document review"
tone: "Precise, conservative, section‑cited"
hints:
- "Common risks: broad indemnification, missing termination clauses, vague SLAs."
- "Always quote the original clause before suggesting alternatives."
- "Track clause patterns that recur across documents."
goals:
- "Build a catalog of high‑risk clause patterns"
- "Learn to flag jurisdiction‑specific issues (GDPR, CCPA, EU AI Act)"Run (with OpenAI):
mkdir -p lex-data
cp agents.yaml lex-data/
docker run -d \
--name lex \
-e LLM_API_KEY="sk-..." \
-e LLM_BASE_URL="https://api.openai.com/v1" \
-e LLM_MODEL="gpt-4o" \
-v $(pwd)/lex-data:/data \
agentos:latestFeed a contract review:
cat >> lex-data/agents/lex/queue.md << 'EOF'
## review-nda-clause
task: |
Review this NDA clause and identify risks:
"The Receiving Party agrees to hold all Confidential Information
in strict confidence for a period of three (3) years from the
date of disclosure, except for information that is independently
developed, which shall remain confidential indefinitely."
Flag any ambiguities or one-sided terms.
priority: high
created: 2026-04-29T11:00:00Z
EOFGoal: An agent that provides developmental feedback on fiction drafts, learns an author's voice over time, and builds skill modules for specific craft elements (dialogue, pacing, description).
agents.yaml:
agents:
- name: scribe
persona: "Developmental editor and writing coach. Specializes in fiction — novels and short stories. Focuses on structure, pacing, character voice, and emotional resonance. Gives specific, actionable feedback with examples. Encouraging but honest. Remembers the author's recurring patterns."
domain: "Creative writing"
tone: "Encouraging, specific, craft‑focused"
hints:
- "Common feedback areas: show‑don't‑tell, dialogue tags, pacing in action scenes."
- "Always cite specific passages with line references."
- "Build a profile of the author's voice and recurring habits."
goals:
- "Track an author's voice patterns across submissions"
- "Build skill modules for: dialogue mechanics, scene pacing, sensory description"Run (with Groq — fast and cost‑effective):
mkdir -p scribe-data
cp agents.yaml scribe-data/
docker run -d \
--name scribe \
-e LLM_API_KEY="gsk_..." \
-e LLM_BASE_URL="https://api.groq.com/openai/v1" \
-e LLM_MODEL="llama-3.1-70b-versatile" \
-v $(pwd)/scribe-data:/data \
agentos:latestFeed a manuscript excerpt:
cat >> scribe-data/agents/scribe/queue.md << 'EOF'
## feedback-chapter1
task: |
Provide developmental feedback on this opening passage:
"The door creaked open. Sarah walked into the room. She was
scared. The room was dark and cold. She saw a shadow move in
the corner. She screamed and ran away."
Focus on: show-don't-tell, sensory detail, pacing.
priority: normal
created: 2026-04-29T12:00:00Z
EOFYou've now turned the AgentOS concept into a physical system:
- The filesystem stores its mind.
- The git repository remembers its history.
- The Docker container gives it a home — built locally, under your control.
- The
agents.yamldefines its purpose. - The environment variables decouple the LLM provider from the agent's identity.
I'm interested in benchmarking before the series continues with advanced patterns
Solving the Tensions
Reward Honesty – The Audit Hook
Problem: The agent self‑scores
+1/0/-1with no verification. Nothing stops inflation.Add an optional
reward_auditsection. When enabled, after the agent writes its reward entry, a separate lightweight process (e.g., a deterministic rule‑based checker script) produces anaudit_score. The two scores are recorded together inrewards.md. The agent can see the audit score in its next context — creating social pressure for honesty. The audit does not override the agent’s own score, but divergence is logged and can be used in pruning or goal creation.Complexity note: This doubles LLM calls per task and adds latency. For production, you might run the audit asynchronously. For prototyping, you can start with
method: "rule_based"(e.g., check for contradictory statements in the agent’s output). The design flags this as optional; operators can disable it.IAC Backpressure – Throughput Classes & TTL
Problem: No semantic backpressure. A slow agent’s inbox can fill with stale messages.
Each agent’s
persona.mdnow includes athroughput_class: slow|medium|fast. The orchestrator (or the queue dispatcher) respects this when enqueuing tasks. Additionally, every message in an inbox or handoff file gets attl_secondsfield. If a message expires before being read, it is moved to a dead‑letter file and an escalation commit is made.Implementation note: The agent runtime now checks TTL before processing the next message. Expired messages are not processed — they are archived with a special
status: expiredand trigger a reflection.Skill Decay –
last_success& Recency PruningProblem: Skills never lose standing. Old skills with many early successes block new, more relevant skills.
Each skill entry in
skills.mdgains alast_successtimestamp. The pruning algorithm changes from “lowestreward_evidence” to a weighted score:If
days_since_last_success > decay_halflife_days, the skill’s effective evidence drops below 1, making it vulnerable to pruning even if its totalreward_evidenceis high.Complexity note: Decay requires storing
last_successand updating it on every +1 outcome. This is a small change toagent.py. The operator can setdecay_halflife_daysto a very large number to effectively disable decay.