Skip to content

Instantly share code, notes, and snippets.

@MuhammadYossry
Created May 1, 2026 11:04
Show Gist options
  • Select an option

  • Save MuhammadYossry/e95bf5c9826c600570dfb73f15d9fc7b to your computer and use it in GitHub Desktop.

Select an option

Save MuhammadYossry/e95bf5c9826c600570dfb73f15d9fc7b to your computer and use it in GitHub Desktop.
Example: AgentResearchOS — Custom Runtime Design

Example: AgentResearchOS — Custom Runtime Design

Prompt template instantiation for: Agent-Native Research Artifact (ARA) system
Paradigm: ARA-first, paper-as-compiled-view, failure-trace as first-class memory
Key constraints: Docker-sandboxed execution · User-supplied model · Zero documentation burden


Template Variable Decisions

Before the design: the template variables as resolved, with explicit justifications for any adaptation or drop.

Variable Value Notes
AGENT_NAME ResearchOS Orchestrator + sandbox pool collective
PRIMARY_PURPOSE Distill research sessions into executable ARA artifacts Narrowed from "do research" to the Live Research Manager pattern
DOMAIN Computational research & ML experimentation Bounded to code-producing research; avoids wet-lab or clinical scope
TARGET_USERS Researchers + coding agents that consume ARAs as baselines Dual audience: human authors + downstream agent readers
TASK_EXAMPLES See §11
RISK_LEVEL medium Results influence downstream agents; hallucinated claims propagate
AUTONOMY_LEVEL semi-autonomous Human gates on world-model commits and claim promotion
TOOLS_ALLOWED Python/shell in sandbox, git, file I/O, citations API Sandboxed; no network from worker containers
TOOLS_FORBIDDEN Direct internet from workers, external DB writes, model self-modification
DATA_SOURCES Local repos, ARA artifacts, PDF corpus (via ARA Compiler)
SUCCESS_METRICS Reproduction rate, claim coverage, trace completeness, peer seal pass rate
DEPLOYMENT_ENV Local laptop → cloud-portable Single docker compose up launch
BUDGET_PRIORITY balanced Routing defined by user; no model hardcoded
PRIVACY_REQUIREMENTS Researcher controls what leaves the container All execution airgapped inside sandbox
HUMAN_REVIEW_POINTS Claim promotion, world-model merge, ARA Seal submission
MULTI_AGENT_REQUIRED yes Orchestrator + LRM distiller + sandbox workers
LONG_TERM_MEMORY yes Git as longitudinal memory; ARA artifact store
LEARNING_ALLOWED constrained Skills + failure traces; no open-ended world-model generation
OUTPUT_STYLE Executable ARA artifact + compiled human-readable view

Template sections adapted or dropped:

  • §4 Custom Agent Memory Modeluser preferences dropped (irrelevant); cases renamed to failure traces (the core ARA insight); world model scoped to a research landscape (known results, retracted claims, active hypotheses per domain).
  • §7 Model Routing — all model names replaced with env-var references; no default hardcoded.
  • §8 Security — expanded with sandbox network policy table; container-level isolation is the primary security primitive, not prompt-level rules.

1. Executive Summary

ResearchOS recasts research session outputs from narrative PDFs into Agent-Native Research Artifacts (ARAs): structured, executable knowledge packages with four interlocking layers — claims, code, failure traces, and raw evidence. A Live Research Manager (LRM) agent sits on top of any coding session and distills the conversation into the ARA in the background. A Compiler agent ingests legacy PDFs/repos into the same format. A Seal agent runs automated verification before human review.

The architecture is not a replacement for scientific judgment. It is a substrate that makes judgment auditable, reproducible, and composable — science that compounds like software.

Why this architecture fits:

  • Failure traces are the primary differentiator. The filesystem keeps dead ends as ranked, attributed evidence — not narrative prose.
  • Every code execution runs in an isolated Docker sandbox. Reproducibility is structural, not aspirational.
  • The model is a user-supplied variable. ResearchOS routes by task difficulty, not by product preference.
  • Git is the longitudinal memory. Every claim promotion, trace update, and world-model merge is a semantic commit with a machine-readable header.

2. Runtime Shape

Choice: Orchestrator + worker pool (multi-container)

┌─────────────────────────────────────────────────────┐
│  docker compose                                      │
│                                                      │
│  ┌─────────────┐   ┌──────────┐   ┌──────────────┐  │
│  │ orchestrator│   │   lrm    │   │    compiler  │  │
│  │  (main.py)  │◄──│ distiller│   │  (pdf→ARA)   │  │
│  └──────┬──────┘   └──────────┘   └──────────────┘  │
│         │  spawns                                    │
│  ┌──────▼──────────────────────────────────────┐     │
│  │  sandbox worker pool  (ephemeral containers) │     │
│  │  sandbox-1  sandbox-2  sandbox-3  ...        │     │
│  │  each: python + git + no network             │     │
│  └──────────────────────────────────────────────┘    │
│                                                      │
│  shared volume: /data  (ARA store + git repo)        │
└─────────────────────────────────────────────────────┘

Why multi-container:

  • Code execution must be isolated. Each sandbox is ephemeral and network-gapped; a crashed or infinite experiment cannot affect the orchestrator or the ARA store.
  • The LRM and Compiler are independent concerns. They read from the session queue and write to the ARA store; they should be independently restartable.
  • Horizontal scaling is trivial: increase SANDBOX_POOL_SIZE in .env.

3. Purpose-Built Filesystem Layout

Only files that earn their place in a research system are included. Generic agent directories (customers/, inbox/outbox/) are dropped.

/data/
├── .git/                          # longitudinal memory — ALL learning events
│
├── ara/                           # primary artifact store
│   ├── index.md                   # registry of all ARAs with status + claim counts
│   ├── <paper-slug>/
│   │   ├── ara.yaml               # ARA manifest (version, authors, claim list, seal status)
│   │   ├── claims/
│   │   │   ├── <claim-id>.yaml    # structured claim: text, evidence refs, confidence
│   │   │   └── ...
│   │   ├── code/
│   │   │   ├── reproduce.sh       # single-command reproduction entry point
│   │   │   ├── environment.yaml   # conda/pip exact pin
│   │   │   └── src/               # research code (may be symlink to repo)
│   │   ├── traces/
│   │   │   ├── failures/          # dead ends, ranked by hours spent + reason failed
│   │   │   ├── decisions/         # judgment calls with rationale
│   │   │   └── checkpoints/       # intermediate results (not in paper)
│   │   ├── evidence/
│   │   │   ├── raw/               # raw outputs: logs, CSVs, model checkpoints
│   │   │   └── figures/           # generated figures with provenance
│   │   └── views/
│   │       ├── paper.md           # compiled human-readable narrative (generated)
│   │       └── review-packet.md   # seal submission view
│
├── agents/
│   ├── orchestrator/
│   │   ├── persona.md             # read-only
│   │   ├── constraints.md         # read-only: budget caps, sandboxing rules
│   │   ├── skills.md              # earned orchestration patterns (max 20)
│   │   ├── goals.md               # recurring gaps
│   │   ├── rewards.md             # rolling log (last 30)
│   │   └── reflections.md         # failure patterns (last 15)
│   ├── lrm/                       # Live Research Manager
│   │   ├── persona.md
│   │   ├── constraints.md
│   │   ├── skills.md              # distillation patterns
│   │   └── session-queue.md       # FIFO of sessions to distill
│   └── compiler/
│       ├── persona.md
│       ├── constraints.md
│       ├── skills.md              # PDF→ARA conversion patterns
│       └── compile-queue.md
│
├── research-landscape/            # scoped world model — NOT open-ended
│   ├── index.md
│   ├── known-results/             # verified claims from imported ARAs
│   ├── retracted/                 # flagged claims with reason
│   └── hypotheses/                # confidence-gated; require tool-backed evidence
│
├── shared/
│   ├── locks/                     # semaphore files for concurrent writes
│   ├── sandbox-results/           # sandbox workers write here; orchestrator reads
│   └── proposals/                 # world-model update proposals awaiting human gate
│
└── system/
    ├── task-queue.md              # incoming research tasks
    ├── routing-policy.yaml        # model routing rules (no model names hardcoded)
    └── seal-policy.yaml           # automated check thresholds

4. Custom Agent Memory Model

Five memory types are kept. Three from the template are adapted; two are dropped.

Kept (adapted)

Skills — earned research patterns, not just code tricks. Examples: "structured ablation before full run", "cache intermediate tensors to evidence/raw/", "always pin random seeds before logging a claim". Budget: 20 entries. Evidence threshold: ≥2 reproduced +1 outcomes.

Verified Facts — promoted claims from the research landscape. Must have: a tool-backed evidence file reference, a confidence score ≥0.85, and no active contradiction in retracted/. These are the epistemic spine of the ARA.

Failure Traces — the core ARA differentiator. Every dead end is a first-class memory object: what was tried, why it failed, how long it took, and a discount_after timestamp (because a failed approach in 2022 may be worth retrying with a 2026 model). The trace is the "ranked menu of what to try and what not to" mentioned in the ARA paper. Traces are never pruned — they are timestamped and discounted, not deleted.

Hypotheses — patterns with weak evidence. Confidence-gated: cannot become a verified fact without tool-backed confirmation. Human review required before world-model merge.

Procedures — reusable experimental protocols (data split strategy, evaluation harness, baseline selection). Separate from skills because they are domain-specific sequences, not heuristics.

Dropped

User Preferences — ResearchOS serves researchers and downstream agents equally. Personalisation at the memory layer adds drift risk with no reproducibility benefit.

Cases — renamed to Failure Traces (above). The original "cases" framing implied positive examples; the ARA insight is that negative cases are what compounds.

Templates — merged into views/ within the ARA artifact. A compiled paper view is a generated output, not a persistent memory object.


5. Verification Architecture

Three levels; the Seal runs all three before human review.

Level 1 — Structural Integrity
  ├── ara.yaml schema validation (jsonschema)
  ├── all claim IDs referenced in code comments
  ├── evidence files exist and are non-empty
  ├── reproduce.sh is executable and environment.yaml is pinned
  └── no missing cross-references between claims and traces

Level 2 — Argumentative Rigor  
  ├── dual-model review: two model calls (routing policy decides tiers)
  │     each produces: claim-support verdict + confidence + contradiction flags
  ├── hypothesis-to-verified-fact gate: confidence ≥ 0.85 + tool evidence
  ├── retraction cross-check: no promoted claim contradicts retracted/
  └── logical consistency scan (claim → evidence → figure chain)

Level 3 — Execution Reproducibility
  ├── sandbox run: `reproduce.sh` executed in isolated container
  ├── output diffed against stored evidence/raw/ (numeric tolerance configurable)
  ├── runtime logged and compared to claimed compute budget
  └── random seed audit: all seeds must be logged before first model call

Human review gates (not automated):

  • Claim promotion from hypothesis to verified fact in research-landscape
  • ARA Seal submission
  • World-model merge (proposals/ → known-results/)
  • Any trace discount_after date extension

6. Learning Policy

Constrained skill + failure-trace learning. No open-ended world-model generation.

Rules:

learning_policy:
  skills:
    allowed: true
    budget: 20
    evidence_threshold: 2   # minimum +1 outcomes before a skill is written
    pruning: lowest reward_evidence when budget full
    owner: orchestrator + lrm (each have separate budgets)

  failure_traces:
    allowed: true
    budget: unlimited        # traces are never pruned, only discounted
    discount_after: 18months # agent weights recent traces higher
    attribution: required    # who tried it, when, with what model tier

  verified_facts:
    allowed: true
    gate: human_review + tool_evidence
    confidence_floor: 0.85

  hypotheses:
    allowed: true
    max_age_without_evidence: 90days  # auto-expire to retracted/ if no evidence

  world_model_autonomous_generation:
    allowed: false           # agents propose; humans merge
    reason: "LLMs hallucinate entities and overgeneralize from little data"

  self_modification:
    allowed: false
    reason: "persona.md and constraints.md are read-only"

7. Model Routing Policy

No model names hardcoded anywhere in the codebase. All routing is via system/routing-policy.yaml, populated from environment variables at startup.

# system/routing-policy.yaml (generated from env at startup)
routing:
  tiers:
    fast:
      env_var: RESEARCHOS_MODEL_FAST
      use_for:
        - structural integrity checks (Level 1 Seal)
        - session distillation (routine claims)
        - failure trace formatting
        - skill pruning decisions

    balanced:
      env_var: RESEARCHOS_MODEL_BALANCED
      use_for:
        - argumentative rigor review (Level 2 Seal, first pass)
        - hypothesis confidence scoring
        - ARA compiled paper view generation
        - procedure extraction from session logs

    strong:
      env_var: RESEARCHOS_MODEL_STRONG
      use_for:
        - dual-model review second pass
        - novel claim evaluation (novelty score > threshold)
        - contradiction detection across research landscape
        - world-model merge proposals

  novelty_threshold: 0.7      # above this → always use strong tier
  budget_override: false      # never downgrade strong-tier tasks for cost
  fallback: balanced          # if a tier env_var is unset

This means the same ResearchOS image runs on any provider's model or local inference — the user sets three env vars, and routing follows the declared policy.


8. Security Policy

Risk level: medium. Primary threat: a research agent that promotes hallucinated claims into the shared research landscape, which downstream agents treat as ground truth.

Network policy (the sandbox is the primary control)

Container          Outbound network     Notes
─────────────────────────────────────────────────────
orchestrator       LLM API only         via egress proxy; no raw internet
lrm                LLM API only
compiler           LLM API + citations  citations API is the only external data source
sandbox-worker     NONE                 fully airgapped; reads/writes via /data volume only

Worker containers are launched with --network none. All inputs are written to the shared volume before launch; all outputs are read back after exit.

Secret handling

.env (never committed)
  RESEARCHOS_MODEL_FAST=...
  RESEARCHOS_MODEL_BALANCED=...
  RESEARCHOS_MODEL_STRONG=...
  LLM_API_KEY=...
  CITATIONS_API_KEY=...

Secrets injected at runtime via env; never written to /data or .git
Pre-commit hook rejects commits containing key patterns (regex scan)

PII and logging

  • Researcher names are stored only in ara.yaml author fields, not in logs.
  • Sandbox stdout/stderr is captured to evidence/raw/<run-id>.log — not streamed to external services.
  • rewards.md and reflections.md contain no user-identifying information.

Approval gates

Action Gate
hypothesis → verified fact human review required
proposals/ → known-results/ human review required
ARA Seal submission human sign-off on Level 2 + 3 results
sandbox reproduce.sh execution automatic (sandboxed)
failure trace discount_after extension human review

9. Docker Project Files

Dockerfile

FROM python:3.12-slim

# System deps
RUN apt-get update && apt-get install -y --no-install-recommends \
    git curl jq \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app

# Python deps
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# App code
COPY src/ ./src/
COPY agents.yaml .
COPY system/ ./system/

# Git config for semantic commits
RUN git config --global user.email "researchos@local" \
 && git config --global user.name "ResearchOS"

# Init /data if not already a git repo (handled at entrypoint)
COPY entrypoint.sh .
RUN chmod +x entrypoint.sh

ENTRYPOINT ["./entrypoint.sh"]

entrypoint.sh

#!/bin/bash
set -e

# Initialize /data git repo on first run
if [ ! -d /data/.git ]; then
  git -C /data init
  git -C /data commit --allow-empty -m "init: ResearchOS data store initialized"
fi

# Generate routing-policy.yaml from env
python src/generate_routing_policy.py

# Start main task loop
exec python src/main.py

docker-compose.yml

services:
  orchestrator:
    build: .
    image: researchos:latest
    volumes:
      - researchos-data:/data
      - /var/run/docker.sock:/var/run/docker.sock  # spawn sandbox containers
    env_file: .env
    environment:
      - AGENT_ROLE=orchestrator
      - SANDBOX_IMAGE=researchos-sandbox:latest
      - SANDBOX_POOL_SIZE=3

  lrm:
    image: researchos:latest
    volumes:
      - researchos-data:/data
    env_file: .env
    environment:
      - AGENT_ROLE=lrm
    depends_on:
      - orchestrator

  compiler:
    image: researchos:latest
    volumes:
      - researchos-data:/data
    env_file: .env
    environment:
      - AGENT_ROLE=compiler
    depends_on:
      - orchestrator

volumes:
  researchos-data:
    driver: local

Dockerfile.sandbox

# Sandbox worker — airgapped, ephemeral, no LLM calls
FROM python:3.12-slim

RUN apt-get update && apt-get install -y --no-install-recommends \
    git \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /workspace

# Sandbox gets a copy of the ARA's environment.yaml at runtime
COPY sandbox_runner.py .

ENTRYPOINT ["python", "sandbox_runner.py"]
# Launched by orchestrator with:
#   docker run --rm --network none \
#     -v researchos-data:/data:ro \
#     -v <run-dir>:/workspace \
#     researchos-sandbox:latest

agents.yaml

agents:
  orchestrator:
    persona: agents/orchestrator/persona.md
    constraints: agents/orchestrator/constraints.md
    skills_budget: 20
    reward_window: 30
    reflection_window: 15
    task_queue: system/task-queue.md

  lrm:
    persona: agents/lrm/persona.md
    constraints: agents/lrm/constraints.md
    skills_budget: 20
    session_queue: agents/lrm/session-queue.md
    distillation_mode: background  # zero burden on researcher

  compiler:
    persona: agents/compiler/persona.md
    constraints: agents/compiler/constraints.md
    skills_budget: 20
    compile_queue: agents/compiler/compile-queue.md
    output_format: ara_v1

sandbox:
  image: researchos-sandbox:latest
  network: none
  mem_limit: 8g
  cpu_limit: 4.0
  timeout_seconds: 3600
  results_path: shared/sandbox-results/

env.example

# Model routing — fill in your own model identifiers; no defaults enforced
RESEARCHOS_MODEL_FAST=
RESEARCHOS_MODEL_BALANCED=
RESEARCHOS_MODEL_STRONG=

# API keys
LLM_API_KEY=
CITATIONS_API_KEY=

# Sandbox pool
SANDBOX_POOL_SIZE=3
SANDBOX_MEM_LIMIT=8g

# ARA store
ARA_STORE_PATH=/data/ara
RESEARCH_LANDSCAPE_PATH=/data/research-landscape

# Seal thresholds
SEAL_L1_REQUIRED=true
SEAL_L2_CONFIDENCE_FLOOR=0.85
SEAL_L3_NUMERIC_TOLERANCE=1e-4

# Git identity for semantic commits
GIT_AUTHOR_EMAIL=researchos@local
GIT_AUTHOR_NAME=ResearchOS

Startup

# Build both images
docker build -t researchos:latest .
docker build -f Dockerfile.sandbox -t researchos-sandbox:latest .

# Copy and fill env
cp env.example .env
# ... edit .env with your model identifiers and API keys ...

# Launch
docker compose up

# Queue a task (from host)
echo "- id: task-001\n  type: compile\n  source: /data/pdfs/attention-is-all-you-need.pdf" \
  >> /path/to/researchos-data/system/task-queue.md

10. Observability

Metrics are research-specific. Generic "task success rate" is replaced with signal that reveals whether the agent is actually improving science throughput.

Metric Source Target
Claim coverage rate claims/ count vs. paper section count > 85%
Trace completeness failure entries per ARA ≥ 3 per major experiment
Reproduction pass rate Seal Level 3 auto-run > 90%
Claim promotion lag hypothesis created → verified fact < 7 days median
Hallucination incidents Level 2 dual-model disagreement rate < 5%
Seal rejection rate L1/L2/L3 auto-fail before human review Track trend; rising = distiller regression
Skill accumulation curve skills.md entries over time Plateau = healthy; churn = policy problem
Failure trace discount rate traces past discount_after date Flag for human review
Sandbox crash rate non-zero exit from sandbox workers < 2%
Model tier usage ratio fast : balanced : strong call counts Track cost vs. claim quality correlation

All metrics are derived from files already in /data — no external observability stack required in v1. A simple python src/metrics.py command generates a system/metrics-report.md on demand.


11. Example Tasks (10 Realistic)

  1. Distill a 4-hour coding session into a new ARA for a gradient checkpointing experiment — LRM extracts 3 claims, 2 dead ends, 1 confirmed performance gain.

  2. Compile a legacy PDF ("Attention Is All You Need") into an ARA: extract structured claims, map figures to evidence files, flag missing reproduction code.

  3. Run a Seal check on an existing ARA — Level 3 catches that reproduce.sh produces output that diverges from evidence/raw/ by 2.3% (outside tolerance).

  4. Rank failure traces for a new researcher starting on transformer quantization — retrieve all traces tagged quantization, sorted by recency and discount_after status.

  5. Propose a world-model update: after three independent +1 reproductions of a new result, draft a proposals/ entry for human gate review.

  6. Extend an existing ARA with a new ablation result — add claim, attach sandbox output as evidence, update reproduce.sh, commit with semantic header.

  7. Detect a contradiction: a new claim conflicts with a known-results/ entry — flag both, write a hypotheses/ entry for the disagreement, queue for dual-model review.

  8. Generate a compiled paper view from an ARA — produce views/paper.md as a narrative document from structured claims, evidence, and trace summaries.

  9. Expire a stale hypothesis: a hypotheses/ entry with no new evidence in 90 days is auto-moved to retracted/ with reason evidence_timeout.

  10. Horizontal scale test: spin 5 sandbox workers simultaneously, each reproducing a different ARA; results written back to shared/sandbox-results/ without collision (lock files enforced).


12. Risks & Failure Modes

Risk Likelihood Impact Mitigation
LRM hallucinates a claim not in the session Medium High — propagates to research landscape Dual-model Level 2 check; human gate on promotion
Sandbox reproduce.sh hangs indefinitely Medium Low — orchestrator unblocked timeout_seconds in agents.yaml; container killed on expiry
Git history grows unbounded Low Medium — slow clones git gc cron in orchestrator; large binary artifacts stored via Git LFS
Failure trace discount_after not updated Medium Medium — agent avoids valid approaches Human review queue for expiring traces
Strong-tier model unavailable (API outage) Low High — Seal Level 2 blocked Fallback to balanced + flag for delayed human review
Retracted claim re-promoted by compiler Low High Cross-check against retracted/ before any promotion
Sandbox worker escapes network isolation Very low Critical --network none is Docker-enforced, not policy-enforced
Two agents write to same ARA simultaneously Medium Medium Lock files in shared/locks/ before any ARA write
Skill budget full with low-quality skills Low Low Pruning on lowest reward_evidence; periodic human audit
Trace discount_after becomes a ceiling for strong agents Medium Low in v1 Add provenance tags to traces; successors can selectively discount

13. Recommended v1 Scope (2 weeks)

Ship exactly these four things, nothing else:

Week 1

  • Docker images build and docker compose up succeeds with a test ARA.
  • LRM distiller: takes a session transcript (plain text), produces a valid ara.yaml + at least one claim file + at least one failure trace. No world-model writes yet.
  • Sandbox worker: runs reproduce.sh from an existing ARA, writes output to shared/sandbox-results/, exits cleanly.

Week 2

  • Seal Level 1 (structural integrity) runs automatically after every LRM distillation.
  • Semantic git commits: every claim addition and trace write produces a correctly formatted commit.
  • metrics.py generates a system/metrics-report.md covering claim count, trace count, and sandbox pass rate.
  • env.example fully documented; first-run experience requires only filling in API keys and model names.

Explicitly out of scope for v1:

  • ARA Compiler (PDF ingestion) — add in v2
  • Seal Level 2 and 3 — add in v2
  • Research landscape / world model — add in v2
  • Horizontal sandbox scaling — add in v2

14. v2 Expansion Plan

Ordered by value delivered per effort, based on the ARA paper's empirical findings.

High value, low risk

  • Seal Level 3 (sandbox reproduction as gating check before any human review).
  • ARA Compiler: PDF → ARA for legacy literature. Start with arXiv HTML format; PDF is harder.
  • Failure trace discount scoring: weight traces by age × model-tier-at-time-of-failure. This is the mechanism that prevents the trace from becoming a ceiling for strong agents.

Medium value, medium risk

  • Research landscape (known-results + hypotheses): start read-only (populated by compiler), add write path only after Level 2 Seal is stable.
  • Horizontal sandbox scaling: SANDBOX_POOL_SIZE > 1 with lock-file coordination.
  • Seal Level 2 dual-model review: add once you have enough ARAs to calibrate the confidence floor.

Longer term

  • ARA-native peer review protocol: reviewers attach signed review packets to artifacts, not to compiled paper views.
  • Forking: git fork an ARA as a starting point for a new experiment, with provenance preserved.
  • Provenance tags on failure traces: let downstream agents query "which traces were generated under model-tier X and base compute Y" and selectively discount.
  • (Human+AI)² Research Network: multiple ResearchOS instances sharing a common research landscape via a git remote, with merge-request style human gates on world-model updates.

Architecture follows the AgentOS blueprint: filesystem as state, rewards as signal, Git as institutional memory, Docker as reproducible runtime. The ARA is not a document format — it is the primary research object. The paper is a view.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment