Skip to content

Instantly share code, notes, and snippets.

@MuhammadYossry
Created April 29, 2026 15:41
Show Gist options
  • Select an option

  • Save MuhammadYossry/150ad6b1590cee8a8b517caf8aa2c561 to your computer and use it in GitHub Desktop.

Select an option

Save MuhammadYossry/150ad6b1590cee8a8b517caf8aa2c561 to your computer and use it in GitHub Desktop.
Part 4: Containerizing AgentOS — Build, Customize, and Deploy Your Self‑Learning Agent(Draft)

Part 4: Containerizing AgentOS — Build, Customize, and Deploy Your Self‑Learning Agent

  • Draft version

"Make it reproducible, or it never happened."

Prerequisites:


Table of Contents


Introduction

In Parts 0–3 we built a complete blueprint for an agent that earns its skills, prunes its memory, and records its own history — all through plain files and git commits. But a blueprint is only half the story. To run an agent reliably, across machines, without hidden dependencies or "works on my laptop" surprises, we need a container.

This article, Part 4, turns the blueprint into a concrete, runnable system. By the end you'll have:

  • A Dockerfile that packages the AgentOS runtime — model‑agnostic and vendor‑neutral.
  • A declarative way to define your own agents (personas, domains, goals, LLM provider).
  • A local image build workflow you control — no external registry required.
  • A single command to launch an agent that learns from real tasks, with proven end‑to‑end flow.
  • Three complete, diverse examples: a code reviewer, a legal‑document analyst, and a creative writing coach.

No vendor lock‑in. No assumptions about which LLM you use. Just files, Python, git, and a Dockerfile.


Why Docker Matters for AgentOS

Concern Without Docker With Docker
Reproducibility "But it worked in my venv…" Image built from a locked‑in definition
Isolation Agents share host filesystem, tools, secrets Each container has its own filesystem and network
Portability pip, Python version, OS quirks docker run anywhere
Observability Logs in disparate places docker logs, mounted volumes
LLM provider swap Rewrite integration code per machine Provider config in environment variables — image unchanged

AgentOS already treats agent state as a filesystem contract. Docker adds an execution contract — the runtime environment is as deterministic as the state layout.


Prerequisites

  • Docker installed (≥ 20.10 recommended). Verify with docker --version.
  • An LLM API key — any provider. AgentOS works with OpenAI, Anthropic, local models via Ollama, or any HTTP‑accessible LLM.
  • git installed on your host (used to inspect agent history from outside the container, optional but useful).
  • The source files listed below.

Project Layout: The Files You'll Own

Create a project directory. You'll place six files inside:

my-agent-project/
├── Dockerfile
├── agents.yaml           # ← you define your agents here
├── agent.py              # model‑agnostic agent runtime
├── bootstrap.py          # filesystem initializer (idempotent)
├── main.py               # container entrypoint
└── requirements.txt      # httpx, pyyaml

The four Python files implement the entire runtime. Their full source was shown in the previous section — here's a quick summary of what each does:

File Purpose
requirements.txt Only two dependencies: httpx>=0.28.1 and pyyaml>=6.0.3. No vendor SDKs.
bootstrap.py Creates the standard file tree (persona.md, constraints.md, skills.md, goals.md, rewards.md, reflections.md, queue.md, system_prompt.md) and initializes a git repo. Idempotent — safe to call on every start.
agent.py Reads tasks from queue.md, builds prompts from agent state files, calls any OpenAI‑compatible endpoint via httpx, parses structured responses, updates skills/goals/rewards, commits to git.
main.py Entrypoint. Reads agents.yaml, bootstraps each agent, then loops forever checking each agent's queue for new tasks.

Step 1 — Define Your Agent(s) in agents.yaml

AgentOS discovers agents through a single YAML file mounted into the container. Every agent you want to run gets an entry here.

agents.yaml:

agents:
  - name: orion
    persona: "Backend engineer specializing in Python, PostgreSQL, and API design. Terse, technical, shows its work. Prioritizes correctness over cleverness."
    domain: "Backend engineering"
    tone: "Terse, technical, no preamble."
    hints:
      - "Tasks often involve parsing structured API data with pagination."
      - "Common failure: missing auth context or rate limiting."
      - "Quick win: reusable retry wrapper with exponential backoff."
    goals:
      - "Handle streaming SSE endpoints robustly"
      - "Build a query‑parameterisation skill for all SQL tasks"

What each field does:

Field Purpose Example
name Unique agent identifier (used for its subdirectory) orion
persona Identity and mandate, written into persona.md Backend engineer…
domain Area of expertise Backend engineering
tone Communication style Terse, technical
hints Seed pointers — not skills, just areas to explore Reusable retry wrapper…
goals Optional; seed goals. Real goals emerge from failures. Handle streaming SSE endpoints

Provider configuration goes in environment variables, not in YAML. This keeps the agent definition portable across providers.


Step 2 — Write the Dockerfile

Dockerfile:

FROM python:3.12-slim

LABEL org.opencontainers.image.title="AgentOS"
LABEL org.opencontainers.image.description="Self-improving LLM agent runtime — model‑agnostic"

# ── System dependencies ─────────────────────────────────────────
RUN apt-get update && \
    apt-get install -y --no-install-recommends git && \
    rm -rf /var/lib/apt/lists/*

# ── Create non‑root user ─────────────────────────────────────────
RUN useradd --create-home --shell /bin/bash agentos

# ── Application directory ────────────────────────────────────────
WORKDIR /app

# ── Python dependencies ──────────────────────────────────────────
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# ── Copy runtime files ───────────────────────────────────────────
COPY main.py bootstrap.py agent.py ./

# ── Data volume mount point ──────────────────────────────────────
RUN mkdir -p /data && chown -R agentos:agentos /data /app
VOLUME /data

# ── Runtime configuration ────────────────────────────────────────
USER agentos
ENV DATA_DIR=/data
ENV POLL_INTERVAL=5

# LLM defaults (override at runtime) — OpenAI‑compatible
ENV LLM_BASE_URL=https://api.openai.com/v1
ENV LLM_MODEL=gpt-4o
ENV LLM_MAX_TOKENS=2048
ENV LLM_TIMEOUT=60

ENTRYPOINT ["python", "-u", "main.py"]

Step 3 — Build the Image Locally

You build the image yourself — no remote registry needed. This keeps you in full control.

# From your project directory (where the Dockerfile lives)
cd my-agent-project
docker build -t agentos:latest .

First build takes 30–60 seconds. Subsequent builds are fast — Docker caches layers unless you change the Python source files.

Verify the image exists:

docker images agentos

Output:

REPOSITORY   TAG       IMAGE ID       CREATED         SIZE
agentos      latest    abc123def456   2 minutes ago   180MB

Step 4 — Prepare Your Data Directory

AgentOS stores all persistent state in a host directory mounted at /data. Create it before running:

mkdir -p my-agent-data
cp agents.yaml my-agent-data/

The directory must contain agents.yaml at its root — the runtime reads this on startup.

What goes where:

my-agent-data/               ← mounted to /data in container
├── agents.yaml              ← agent definitions (required)
└── agents/                  ← created automatically by bootstrap
    └── orion/               ← one subdirectory per agent
        ├── persona.md
        ├── constraints.md
        ├── skills.md
        ├── goals.md
        ├── rewards.md
        ├── reflections.md
        ├── queue.md
        ├── system_prompt.md
        ├── skills/
        ├── iac/
        └── .git/            ← agent's git history

The agent subdirectories are created automatically on first run — you only need to provide agents.yaml.


Step 5 — Run Your Agent Container

Supply your LLM API key and mount the data directory. The provider is configured via environment variables.

With OpenAI:

docker run -d \
  --name agentos-orion \
  -e LLM_API_KEY="sk-..." \
  -e LLM_BASE_URL="https://api.openai.com/v1" \
  -e LLM_MODEL="gpt-4o" \
  -v $(pwd)/my-agent-data:/data \
  agentos:latest

With Anthropic (via compatible endpoint):

docker run -d \
  --name agentos-orion \
  -e LLM_API_KEY="sk-ant-..." \
  -e LLM_BASE_URL="https://api.anthropic.com/v1" \
  -e LLM_MODEL="claude-sonnet-4-20250514" \
  -v $(pwd)/my-agent-data:/data \
  agentos:latest

With Ollama (local, no cloud costs):

docker run -d \
  --name agentos-local \
  -e LLM_API_KEY="ollama" \
  -e LLM_BASE_URL="http://host.docker.internal:11434/v1" \
  -e LLM_MODEL="llama3.1:8b" \
  -e LLM_MAX_TOKENS=4096 \
  -v $(pwd)/my-agent-data:/data \
  agentos:latest

With Groq:

docker run -d \
  --name agentos-groq \
  -e LLM_API_KEY="gsk_..." \
  -e LLM_BASE_URL="https://api.groq.com/openai/v1" \
  -e LLM_MODEL="llama-3.1-70b-versatile" \
  -v $(pwd)/my-agent-data:/data \
  agentos:latest

Verify it started:

docker ps --filter name=agentos
docker logs agentos-orion

You should see:

2026-04-29T10:00:00 INFO [bootstrap] Initialised git repo at /data/agents/orion
2026-04-29T10:00:00 INFO [agentos] Agent 'orion' ready at /data/agents/orion
2026-04-29T10:00:00 INFO [agentos] AgentOS running with 1 agent(s). Polling every 5s.

Step 6 — Feed Tasks and Watch Your Agent Learn

Tasks arrive by writing a task block into the agent's queue.md file. The agent polls every few seconds, picks up the first task, processes it, and removes it from the queue.

Enqueue a task

Write a task block into the queue file in your mounted data directory:

cat >> my-agent-data/agents/orion/queue.md << 'EOF'

## first-task
task: List your active constraints and explain how the reward system works.
priority: normal
created: 2026-04-29T10:05:00Z
EOF

Important: The leading blank line before ## first-task ensures the block is properly separated from any existing content.

Task block format

Each task block in queue.md must follow this structure:

## <unique-task-id>
task: <description of what the agent should do>
priority: normal|high|low
created: <ISO 8601 timestamp>

The ## header marks the start of a task block. The runtime splits the file on ## headers and processes the first complete block it finds. After processing, it removes the block from the file.

Watch the agent process the task

docker logs -f agentos-orion

You'll see the agent spring into action:

2026-04-29T10:05:05 INFO [agent] [orion] Processing task: first-task
2026-04-29T10:05:12 INFO [agent] [orion] Response:
The active constraints are: skills.md capped at 20 entries,
goals.md capped at 5 active goals, rewards.md keeps last 30 entries,
and reflections.md keeps last 15 entries. The reward system uses
+1 for successful reusable outcomes, 0 for partial, -1 for failures.
Skills are only earned from +1 outcomes...

Verify the agent updated its state

After processing, inspect the files:

# Check the reward was recorded
cat my-agent-data/agents/orion/rewards.md

# Check if a skill was earned (if the task was a +1)
cat my-agent-data/agents/orion/skills.md

# View the agent's git history
git -C my-agent-data/agents/orion log --oneline

Example git log:

* a3f9c12 orion(task): first-task
* 8b2e01a init: agent filesystem

Feed a series of tasks

The agent grows through repeated task cycles. Queue several tasks to watch it build skills:

cat >> my-agent-data/agents/orion/queue.md << 'EOF'

## paginated-api
task: Write a Python function that fetches all pages from a paginated REST API endpoint. The API returns a 'next' field in the response body. Handle rate limits gracefully.
priority: high
created: 2026-04-29T10:10:00Z
EOF

cat >> my-agent-data/agents/orion/queue.md << 'EOF'

## sql-parameterization
task: Write a function that builds a parameterized SQL SELECT query from a table name, a list of column names, and a dict of WHERE conditions. Prevent SQL injection.
priority: high
created: 2026-04-29T10:15:00Z
EOF

After each task completes, check the agent's growing skill library:

cat my-agent-data/agents/orion/skills.md

Over multiple tasks, you'll see skills accumulate (and old ones pruned when the budget fills) — the agent evolving exactly as designed in Part 0.

Stopping and restarting

Stop the container:

docker stop agentos-orion

All state is on the host in my-agent-data/. Restart with the same volume mount:

docker start agentos-orion

The agent resumes polling its queue. Any tasks you wrote to queue.md while stopped will be picked up immediately.

Rebuilding with a new image version

When you modify the Python source files, rebuild and restart:

docker build -t agentos:v2 .
docker stop agentos-orion
docker rm agentos-orion
docker run -d \
  --name agentos-orion \
  -e LLM_API_KEY="sk-..." \
  -v $(pwd)/my-agent-data:/data \
  agentos:v2

Your agent's skills, goals, rewards, and git history survive because they live on the mounted volume — not inside the container.


Model‑Agnostic Design

The runtime makes zero assumptions about which LLM you use. The contract is simple:

  • Endpoint: Any URL that accepts POST /chat/completions with OpenAI‑compatible JSON.
  • Auth: Bearer <token> in the Authorization header.
  • Response: Standard {"choices": [{"message": {"content": "..."}}]} format.

This covers: OpenAI, Anthropic (via adapter), Ollama, Groq, Together AI, Fireworks, DeepInfra, vLLM, llama.cpp, LiteLLM proxy, and any self‑hosted model behind an OpenAI‑compatible wrapper.

To switch providers, change three environment variables. The image and config stay the same.


Customization Patterns

Pattern A — Specialize an Agent for a Domain

agents:
  - name: iris
    persona: "Data analyst specialized in anomaly detection on time‑series data. Communicates with concise summaries."
    domain: "Data analytics"
    tone: "Query‑oriented, evidence‑first"
    hints:
      - "Most tasks involve SQL with outlier detection."
      - "Always validate data completeness before analysis."
    goals:
      - "Automatically validate data completeness before analysis"

Pattern B — Multiple Agents in One Container

agents:
  - name: orion
    persona: ...
    domain: "Backend engineering"
    hints: [...]
  - name: iris
    persona: ...
    domain: "Data analytics"
    hints: [...]
  - name: nova
    persona: ...
    domain: "Technical writing"
    hints: [...]

All share the container but have independent file trees, git histories, and queues. Coordination via files (handoffs, shared segments) is covered in Part 2.

Pattern C — Persistent State Across Upgrades

The /data volume is your agent's long‑term memory. Build a new image locally and restart:

docker build -t agentos:v2 .
docker stop agentos-v1 && docker rm agentos-v1
docker run -d --name agentos-v2 -v $(pwd)/my-agent-data:/data agentos:v2

The agent wakes up with all earned skills, reward logs, and git history intact.


Observing Your Agent's Growth

Everything is plain files and git. No special tools required.

# Current skills
docker exec agentos-orion cat /data/agents/orion/skills.md

# Reward history
docker exec agentos-orion cat /data/agents/orion/rewards.md

# Learning timeline (from host)
git -C my-agent-data/agents/orion log --oneline --graph

# Live activity stream
docker logs -f agentos-orion

# Check queue for pending tasks
cat my-agent-data/agents/orion/queue.md

# Count skills earned
docker exec agentos-orion grep -c "^### " /data/agents/orion/skills.md

# See the last 5 reward outcomes
docker exec agentos-orion grep "^-\s*reward:" /data/agents/orion/rewards.md | tail -5

Next Steps

  • Add a REST API to push tasks programmatically instead of writing to queue files.
  • Implement IAC primitives (Part 2) — handoff files, shared segments, queues for multi‑agent collaboration.
  • Scale with Docker Compose — per‑agent containers, shared network, optional shared git remote.
  • Harden for production — healthchecks, resource limits, read‑only root filesystem, secret management.
  • Add a watchdog — script that monitors rewards.md and alerts on sustained -1 streaks.

Complete Examples

Example A — Code Review Agent (with Ollama, fully local)

Goal: An agent that reviews code snippets, identifies bugs and anti‑patterns, and builds a library of recurring issues with detection heuristics.

agents.yaml:

agents:
  - name: revu
    persona: "Senior code reviewer. Knows Python, TypeScript, and Go. Identifies logic errors, security vulnerabilities, and style violations. Cites specific lines. Suggests concrete fixes with before/after diffs. Never approves without evidence."
    domain: "Code review"
    tone: "Direct, evidence‑based, line‑specific"
    hints:
      - "Common issues: missing input validation, race conditions, SQL injection vectors, resource leaks."
      - "Suggestions should include before/after code blocks."
      - "Track recurring anti‑patterns across reviews."
    goals:
      - "Build a library of recurring anti‑patterns with detection heuristics"
      - "Learn framework‑specific security pitfalls (Django, Express, Gin)"

Run (Ollama must be running locally):

mkdir -p revu-data
cp agents.yaml revu-data/

docker run -d \
  --name revu \
  -e LLM_API_KEY="ollama" \
  -e LLM_BASE_URL="http://host.docker.internal:11434/v1" \
  -e LLM_MODEL="llama3.1:8b" \
  -e LLM_MAX_TOKENS=4096 \
  -v $(pwd)/revu-data:/data \
  agentos:latest

Feed a review task:

cat >> revu-data/agents/revu/queue.md << 'EOF'

## review-flask-endpoint
task: |
  Review the following Python code for security issues and bugs.
  
  ```python
  @app.route('/user/<username>')
  def get_user(username):
      query = f"SELECT * FROM users WHERE name = '{username}'"
      result = db.execute(query)
      return jsonify(result.fetchone())

Provide specific line-by-line findings with fixes. priority: high created: 2026-04-29T10:30:00Z EOF


**Watch it learn:**

```bash
docker logs -f revu

Example B — Legal Document Analyst (with OpenAI)

Goal: An agent that reviews contract clauses, flags risky language, and builds a knowledge base of problematic patterns across documents.

agents.yaml:

agents:
  - name: lex
    persona: "Contract analyst specializing in software licensing agreements, NDAs, and service level agreements. Identifies ambiguous language, missing clauses, and unfavorable terms. Cites specific sections. Suggests neutral alternative language. Conservative by default — flags anything uncertain."
    domain: "Legal document review"
    tone: "Precise, conservative, section‑cited"
    hints:
      - "Common risks: broad indemnification, missing termination clauses, vague SLAs."
      - "Always quote the original clause before suggesting alternatives."
      - "Track clause patterns that recur across documents."
    goals:
      - "Build a catalog of high‑risk clause patterns"
      - "Learn to flag jurisdiction‑specific issues (GDPR, CCPA, EU AI Act)"

Run (with OpenAI):

mkdir -p lex-data
cp agents.yaml lex-data/

docker run -d \
  --name lex \
  -e LLM_API_KEY="sk-..." \
  -e LLM_BASE_URL="https://api.openai.com/v1" \
  -e LLM_MODEL="gpt-4o" \
  -v $(pwd)/lex-data:/data \
  agentos:latest

Feed a contract review:

cat >> lex-data/agents/lex/queue.md << 'EOF'

## review-nda-clause
task: |
  Review this NDA clause and identify risks:
  
  "The Receiving Party agrees to hold all Confidential Information
  in strict confidence for a period of three (3) years from the
  date of disclosure, except for information that is independently
  developed, which shall remain confidential indefinitely."
  
  Flag any ambiguities or one-sided terms.
priority: high
created: 2026-04-29T11:00:00Z
EOF

Example C — Creative Writing Coach (with Groq)

Goal: An agent that provides developmental feedback on fiction drafts, learns an author's voice over time, and builds skill modules for specific craft elements (dialogue, pacing, description).

agents.yaml:

agents:
  - name: scribe
    persona: "Developmental editor and writing coach. Specializes in fiction — novels and short stories. Focuses on structure, pacing, character voice, and emotional resonance. Gives specific, actionable feedback with examples. Encouraging but honest. Remembers the author's recurring patterns."
    domain: "Creative writing"
    tone: "Encouraging, specific, craft‑focused"
    hints:
      - "Common feedback areas: show‑don't‑tell, dialogue tags, pacing in action scenes."
      - "Always cite specific passages with line references."
      - "Build a profile of the author's voice and recurring habits."
    goals:
      - "Track an author's voice patterns across submissions"
      - "Build skill modules for: dialogue mechanics, scene pacing, sensory description"

Run (with Groq — fast and cost‑effective):

mkdir -p scribe-data
cp agents.yaml scribe-data/

docker run -d \
  --name scribe \
  -e LLM_API_KEY="gsk_..." \
  -e LLM_BASE_URL="https://api.groq.com/openai/v1" \
  -e LLM_MODEL="llama-3.1-70b-versatile" \
  -v $(pwd)/scribe-data:/data \
  agentos:latest

Feed a manuscript excerpt:

cat >> scribe-data/agents/scribe/queue.md << 'EOF'

## feedback-chapter1
task: |
  Provide developmental feedback on this opening passage:
  
  "The door creaked open. Sarah walked into the room. She was
  scared. The room was dark and cold. She saw a shadow move in
  the corner. She screamed and ran away."
  
  Focus on: show-don't-tell, sensory detail, pacing.
priority: normal
created: 2026-04-29T12:00:00Z
EOF

Closing Thoughts

You've now turned the AgentOS concept into a physical system:

  • The filesystem stores its mind.
  • The git repository remembers its history.
  • The Docker container gives it a home — built locally, under your control.
  • The agents.yaml defines its purpose.
  • The environment variables decouple the LLM provider from the agent's identity.

I'm interested in benchmarking before the series continues with advanced patterns

@MuhammadYossry
Copy link
Copy Markdown
Author

MuhammadYossry commented Apr 29, 2026

Solving the Tensions

Reward Honesty – The Audit Hook

Problem: The agent self‑scores +1/0/-1 with no verification. Nothing stops inflation.

Add an optional reward_audit section. When enabled, after the agent writes its reward entry, a separate lightweight process (e.g., a deterministic rule‑based checker script) produces an audit_score. The two scores are recorded together in rewards.md. The agent can see the audit score in its next context — creating social pressure for honesty. The audit does not override the agent’s own score, but divergence is logged and can be used in pruning or goal creation.

Complexity note: This doubles LLM calls per task and adds latency. For production, you might run the audit asynchronously. For prototyping, you can start with method: "rule_based" (e.g., check for contradictory statements in the agent’s output). The design flags this as optional; operators can disable it.

IAC Backpressure – Throughput Classes & TTL

Problem: No semantic backpressure. A slow agent’s inbox can fill with stale messages.

Each agent’s persona.md now includes a throughput_class: slow|medium|fast. The orchestrator (or the queue dispatcher) respects this when enqueuing tasks. Additionally, every message in an inbox or handoff file gets a ttl_seconds field. If a message expires before being read, it is moved to a dead‑letter file and an escalation commit is made.

iac:
  default_message_ttl: 300  # seconds
  dead_letter_path: "/data/shared/iac/dead_letters/"
  backpressure:
    enabled: true
    max_queue_depth_per_agent: 10  # per throughput_class

Implementation note: The agent runtime now checks TTL before processing the next message. Expired messages are not processed — they are archived with a special status: expired and trigger a reflection.

Skill Decay – last_success & Recency Pruning

Problem: Skills never lose standing. Old skills with many early successes block new, more relevant skills.

Each skill entry in skills.md gains a last_success timestamp. The pruning algorithm changes from “lowest reward_evidence” to a weighted score:

effective_evidence = reward_evidence * decay(last_success)
decay = 1 - (days_since_last_success / decay_halflife_days)

If days_since_last_success > decay_halflife_days, the skill’s effective evidence drops below 1, making it vulnerable to pruning even if its total reward_evidence is high.

skills:
  max_entries: 20
  decay_halflife_days: 14
  recency_weight: 0.7   # 70% weight on evidence, 30% on recency? (simpler: use half‑life)

Complexity note: Decay requires storing last_success and updating it on every +1 outcome. This is a small change to agent.py. The operator can set decay_halflife_days to a very large number to effectively disable decay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment