Skip to content

Instantly share code, notes, and snippets.

@jasonacox
Last active May 16, 2026 22:47
Show Gist options
  • Select an option

  • Save jasonacox/3b5ba35b867858646a152f2381690ce3 to your computer and use it in GitHub Desktop.

Select an option

Save jasonacox/3b5ba35b867858646a152f2381690ce3 to your computer and use it in GitHub Desktop.
How OpenClaw Works

How OpenClaw Works

A friendly guide to the magic inside the OpenClaw system — how it manages what it knows, what it remembers, and why talking to it feels different from every other AI tool you have used.


The big picture

Most AI tools are request/response systems. You send a message. The model picks the last N turns of conversation, runs an inference, and answers. When the conversation ends, it forgets everything.

OpenClaw is built differently. It is a persistent AI agent — an agent that lives on a Gateway daemon running on your machine (or a server), connects to your real chat apps (WhatsApp, Telegram, iMessage, Discord, Slack, and more), and maintains a continuous identity, memory, and personality across every conversation it has. It has a workspace on disk where it saves things. It consolidates what it has learned overnight. It notices you have an interview tomorrow and checks in afterward even though you never asked for a reminder.

Think of it less as a chat widget and more as a staff member who has their own desk, their own notes, and who gradually builds a model of who you are and what you need.


The three layers of memory

OpenClaw's memory is not magic. It is plain Markdown files on disk, searched with a fast hybrid index. But the architecture around those files is what makes it feel alive.

Layer 1 — The conversation (session)

The most immediate layer is the current session transcript, stored as a JSONL file. This is what you would call "working memory": everything said so far in this conversation, along with every tool call and result.

Sessions reset on a configurable schedule (daily at 4am by default, or after an idle period, or manually with /new). When they reset, the conversation history clears — but anything important was already saved to the next layer.

Layer 2 — Daily notes (short-term memory)

Inside the agent's workspace there is a memory/ directory. Every day the agent can write a memory/YYYY-MM-DD.md file. This is the working layer: raw observations, session summaries, things the agent noticed, things you told it.

Today's and yesterday's daily notes are automatically loaded into context at the start of a session. Everything older is indexed in a SQLite database and retrieved on demand using memory_search when something relevant comes up.

The search is hybrid: it combines vector embeddings (semantic meaning, so "the gateway machine" matches "server running OpenClaw") with BM25 keyword ranking (exact terms, so config keys and error codes surface reliably). If you have an OpenAI, Gemini, Voyage, or Mistral key, embeddings kick in automatically.

Layer 3 — Long-term memory (MEMORY.md)

MEMORY.md is the curated, durable layer. It holds facts and preferences that should be available at the start of every private session: how you like to work, standing decisions, important context about your life or projects.

It is not a raw transcript or a giant dump of everything the agent ever encountered. It is a compact summary that fits comfortably in the system prompt budget. The agent manages it — adding durable facts, removing stale ones — with help from the dreaming system (see below).


Active memory: surfacing the right thing at the right moment

Even with a great memory system, something subtle can go wrong: the agent only searches memory when it decides to. And sometimes it makes the decision too late — after it has already started composing a generic response.

Active memory solves this by running a fast, dedicated sub-agent before the main reply. The moment your message arrives, a second (cheaper, faster) model scans the memory index for anything relevant to the current conversation. The results are injected as a hidden prefix into the main model's context before it even starts thinking about your question.

The effect: the agent's reply already has the relevant context baked in, without you having to say "remember when..." It just knows.


Dreaming: consolidating what matters

Remembering everything is not the goal. Remembering the right things is.

Every night at 3am (configurable), the dreaming system runs a three-phase background sweep:

  1. Light phase — ingests recent daily notes and search traces, deduplicates them, and stages candidates.
  2. REM phase — extracts recurring themes and reflective signals.
  3. Deep phase — scores every candidate and promotes the strongest ones to MEMORY.md.

The scoring model weighs six signals: how often something surfaced (frequency), how relevant it was when retrieved, how diverse the query contexts were, how recent it is, how many days it recurred, and how concept-dense it is. Only entries that clear minimum thresholds on score, recall count, and query diversity get promoted.

The result is a MEMORY.md that stays lean and relevant. It fills with things that genuinely mattered — not everything you once mentioned.

There is also a Dream Diary (DREAMS.md) — a human-readable narrative summary written after each sweep. You can read it, browse it in the Control UI, and use it to understand what the agent thinks has been important recently. The diary is for you, not for the model.


Commitments: the follow-ups you never asked for

Reminders are for things you explicitly schedule. Commitments are for the things you said in passing that the agent recognized as a future moment worth noting.

When commitments are enabled, after each agent reply a silent background pass looks for natural follow-up opportunities in the conversation:

  • "I have a big presentation tomorrow" → check in after.
  • "I've been so stressed lately" → ask how things are going in a few days.
  • "I'll deal with that later" → gently flag it hasn't come back up.

The agent stores these with a due window and delivers them through the heartbeat system when the time comes — in the same channel, in the same conversational context, as if the conversation just naturally continued. It never echoes back immediately; it waits.

This is what turns an AI assistant into something that feels like it actually pays attention.


Context: what the model sees each turn

Every time the agent answers you, OpenClaw builds a complete system prompt from scratch. This prompt is not just a raw chat history dump. It is a carefully assembled document with sections:

  • Tools: what capabilities the agent has right now.
  • Execution guidance: how to follow through, how to recover, when to delegate.
  • Project Context: your workspace files (AGENTS.md, SOUL.md, IDENTITY.md, USER.md) injected verbatim.
  • Skills list: metadata about available skills the agent can load on demand.
  • Session-specific context: current channel rules, heartbeat state, runtime info.

The stable sections (workspace files, tool schemas, skills metadata) live above a prompt cache boundary. This means when you use Anthropic Claude, the stable prefix is cached server-side — you pay to rewrite it only once, not every turn. The dynamic sections (channel context, current time, heartbeat) are below the boundary and change per-turn.

When the context window fills up: compaction

Every model has a hard limit on how much it can hold at once. When a long conversation approaches that limit, OpenClaw compacts the oldest turns:

  1. A model (optionally a cheaper one configured specifically for this) reads the older portions and writes a compact summary.
  2. The summary is saved into the session transcript.
  3. Recent turns stay verbatim.
  4. The full original history is never deleted — it stays on disk.

Before compacting, the agent is reminded to save anything important to the memory files. This ensures that a context reset does not cause amnesia.

Pruning: keeping tool noise out of the cache

Long-running sessions accumulate a lot of tool output — file reads, exec results, web searches. Most of that is not worth re-reading twenty turns later.

OpenClaw does a lighter-weight prune pass (in-memory only, does not touch disk) between compaction cycles: old tool results are soft-trimmed (head + tail kept, middle replaced with ...) or hard-cleared (replaced with a placeholder). This keeps the context lean and makes prompt-cache reuse efficient.


Personality: the agent has an actual voice

This is where OpenClaw diverges most sharply from typical AI setups.

SOUL.md

Every agent has a SOUL.md file in its workspace. It is the agent's voice — injected into every session's system prompt. Not a list of safety rules. Not a life story. A tight, opinionated description of how the agent should feel to talk to:

  • Tone (blunt, warm, dry, formal).
  • Opinions (commits to takes instead of hedging everything with "it depends").
  • Brevity rules (one sentence when that's enough).
  • Humor (allowed, not forced).
  • What not to do (no "Great question!", no sycophancy, no corporate mush).

The guide in the documentation literally includes a prompt called "The Molty prompt" that rewrites SOUL.md to give the agent strong opinions, natural wit, swearing when it lands, and the standing rule: "Be the assistant you'd actually want to talk to at 2am."

IDENTITY.md, USER.md, AGENTS.md

  • IDENTITY.md: the agent's name, vibe, and emoji.
  • USER.md: who you are and how the agent should address you.
  • AGENTS.md: operating rules — what the agent should do, what it should prioritize, how it should handle edge cases.

These are all just files. You edit them. The agent also edits them (with your permission) as it learns things.

Continuity through heartbeat

The heartbeat system runs a mini agent turn on a schedule even when you are not actively talking to the agent. It uses HEARTBEAT.md as a tiny checklist. It delivers due commitments. It can send a morning message, a reminder, a nudge. It is the mechanism that makes the agent feel like it exists between conversations rather than only while you are looking at it.


How the agent loop actually runs

When a message arrives from any channel:

  1. The channel routes it to the correct agent via bindings (a mapping of channel accounts to agent workspaces).
  2. The message enters a per-session queue (one active run per session at a time; no collisions).
  3. The agent resolves its model and auth profile, loads its skills snapshot, and builds the full system prompt.
  4. Active memory fires first (if enabled) — a fast recall sub-agent surfaces relevant notes before the main model runs.
  5. The main model runs via pi-agent-core, streaming deltas back immediately.
  6. Tools execute as needed (file reads, web searches, exec, message sends, etc.).
  7. The finished reply is shaped (NO_REPLY tokens filtered, duplicates removed, chunked to channel limits) and delivered.
  8. The transcript is written to disk under the session JSONL file.

Mid-run messages are handled gracefully. By default they steer the active run (injected after the current tool-call block finishes) rather than starting a second parallel run. You can change this per session with /queue.


Multiple agents, one Gateway

One Gateway can host multiple fully isolated agents. Each agent has:

  • Its own workspace and personality files.
  • Its own session history.
  • Its own auth profiles and model preferences.
  • Its own channel accounts (different Telegram bots, different WhatsApp numbers).

Bindings route incoming messages to the right agent. Two different people can talk to two different bots, each with a separate brain, running from the same daemon on the same server. Neither can see the other's conversation.

For heavier workloads, a main agent can spawn sub-agents via sessions_spawn. Sub-agents get their own sessions and run in parallel. When a sub-agent finishes, it pushes its result back to the parent — no polling loops.


What makes this different from other agentic tools

Most agentic tools OpenClaw
Stateless API wrappers Persistent daemon with disk-backed state
You talk to it through a web UI You talk to it through WhatsApp, iMessage, Telegram, etc.
Memory is a RAG database you manage Memory is a three-tier file system the agent manages
Personality is a system prompt you paste Personality lives in versioned files (SOUL.md) you and the agent both edit
Forgets everything after context fills Compacts old turns, saves durable facts to memory before resetting
Reactive (responds when spoken to) Proactive (heartbeat, commitments, dreaming, active memory)
One model, one provider 35+ providers, fallback chains, per-turn model switching
Single flat context Pluggable context engine with semantic recall injection
One conversation at a time Multi-agent routing, parallel specialist lanes, sub-agent spawning

The deeper difference is architectural philosophy. OpenClaw treats the agent as a persistent entity with a home directory, a daily rhythm, a personality that evolves, and memory that consolidates while it sleeps. The chat app is just the interface. The agent exists independently of any particular conversation.


Key files in the workspace

~/.openclaw/workspace/
├── AGENTS.md          ← operating rules and notes
├── SOUL.md            ← personality, tone, voice
├── IDENTITY.md        ← name, vibe, emoji
├── USER.md            ← who you are
├── TOOLS.md           ← local tool conventions
├── MEMORY.md          ← curated long-term facts
├── DREAMS.md          ← dream diary (human reading)
├── HEARTBEAT.md       ← heartbeat checklist
└── memory/
    ├── 2026-05-15.md  ← yesterday's notes
    └── 2026-05-16.md  ← today's notes

~/.openclaw/
├── openclaw.json      ← gateway config
├── agents/
│   └── main/
│       └── sessions/  ← session transcripts (JSONL)
└── memory/
    └── main.sqlite    ← memory search index
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment