Skip to content

Instantly share code, notes, and snippets.

@patleeman
Last active June 4, 2026 08:02
Show Gist options
  • Select an option

  • Save patleeman/b1b5768393f9bf2f60865b1defeeb819 to your computer and use it in GitHub Desktop.

Select an option

Save patleeman/b1b5768393f9bf2f60865b1defeeb819 to your computer and use it in GitHub Desktop.
How OpenAI Codex implements the /goal slash command -- persisted long-running task objectives

Codex /goal Implementation

How OpenAI's Codex CLI implements the /goal slash command for persisted long-running task objectives.

The /goal command sets a persisted objective for a long-running task. It's a five-layer system: a SQLite table stores one goal per thread with status (active/paused/budget_limited/complete), optional token budget, and running usage counters. The app-server exposes thread/goal/set/get/clear JSON-RPC methods. The model sees three tools — create_goal, update_goal(complete), and get_goal — but can't pause/resume; those are system-controlled. A runtime event bus hooks into turn lifecycle to track token + wall-clock deltas for accounting, auto-pauses on interrupt, auto-reactivates paused goals on thread resume, and injects budget-limit steering into the model's response stream. The TUI handles the /goal slash command and displays goal state in the status bar.

Author: etraut-openai | 5 PRs, ~15K additions, landed in ~10 days (Apr 16–25, 2026)


Architecture: 5-PR Stack

PR 1: Persistence foundation (#18073)

Feature flag: Feature::Goals under Stage::UnderDevelopment, default-off.

SQLite schema (migration 0029_thread_goals.sql):

CREATE TABLE thread_goals (
    thread_id TEXT PRIMARY KEY NOT NULL REFERENCES threads(id) ON DELETE CASCADE,
    goal_id TEXT NOT NULL,
    objective TEXT NOT NULL,
    status TEXT NOT NULL CHECK(status IN ('active', 'paused', 'budget_limited', 'complete')),
    token_budget INTEGER,
    tokens_used INTEGER NOT NULL DEFAULT 0,
    time_used_seconds INTEGER NOT NULL DEFAULT 0,
    created_at_ms INTEGER NOT NULL,
    updated_at_ms INTEGER NOT NULL
);

4 statuses:

  • active — in progress, system accounts usage
  • paused — stopped by user/system, usage not tracked
  • budget_limited — token budget exhausted (terminal, but in-flight usage still accounted)
  • complete — objective achieved (terminal)

State runtime APIs (in state/src/runtime/goals.rs):

  • get_thread_goal — read current goal
  • replace_thread_goal — upsert with new goal_id (resets usage)
  • insert_thread_goal — create-only (ON CONFLICT DO NOTHING)
  • update_thread_goal — partial status/budget update
  • pause_active_thread_goal — status='active' → 'paused'
  • delete_thread_goal — remove
  • account_thread_goal_usage — atomically add time + tokens, auto-set budget_limited

Key design: stale update protection. Each replacement generates a new goal_id UUID. Callers pass expected_goal_id; if the current goal_id doesn't match, the update is silently ignored. This prevents an old, in-flight accounting call from clobbering a newly replaced goal.

Budget auto-enforcement via SQL CASE statements: setting token_budget on an active goal whose tokens_used already exceeds the limit immediately transitions to budget_limited.


PR 2: App-server API (#18074)

Three experimental JSON-RPC methods on the app-server protocol:

Method Purpose
thread/goal/set Create, replace, or update goal. New objective = replace (reset usage). Same non-terminal objective = update (preserve usage).
thread/goal/get Fetch current goal. Returns goal: null when none.
thread/goal/clear Delete the goal. Returns cleared: bool.

Two experimental server notifications:

Method Triggers
thread/goal/updated Any goal change; includes full ThreadGoal + optional turnId
thread/goal/cleared Goal removed; includes threadId

Protocol types (ThreadGoal):

{
  "threadId": "thr_xxx",
  "objective": "...",
  "status": "active",
  "tokenBudget": 200000,
  "tokensUsed": 0,
  "timeUsedSeconds": 0,
  "createdAt": 1776272400,
  "updatedAt": 1776272400
}

Replacement semantics via thread/goal/set:

  • { objective: "new goal" } → replaces goal, resets usage to zero
  • { objective: "existing goal" } → same objective = update status/budget, preserve usage
  • { status: "paused" } → pause existing goal
  • { tokenBudget: 50000 } → change budget (may immediately budget-limit)
  • tokenBudget: null → remove budget; tokenBudget omitted → don't change

Marked as #[experimental("thread/goal/set")] etc. — only available when feature flag is enabled.


PR 3: Model tools (#18075)

Three tools exposed to the model (not the full action space — intentional asymmetry):

Tool Args Behavior
create_goal { objective, token_budget? } Fails if goal already exists; creates new active goal
update_goal { status: "complete" } Can only mark complete; pause/resume/budget-limit are system-controlled
get_goal none Returns current goal or null

Design principle: The model can start a goal and declare it complete, but pause/resume/budget transitions are controlled by the user or the system runtime. The tool spec explicitly says:

"Create a goal only when explicitly requested by the user or system/developer instructions; do not infer goals from ordinary tasks."

Tool response includes a completion_budget_report when a budgeted goal is marked complete:

{
  "goal": { ... },
  "remainingTokens": 6750,
  "completionBudgetReport": "Goal achieved. Report final budget usage to the user: tokens used: 3250 of 10000; time used: 75 seconds."
}

Each tool handler delegates to Session methods that validate, persist, emit events, and update runtime accounting state.


PR 4: Core Runtime (#18076)

The goal lifecycle engine. Lives in core/src/goals.rs.

GoalRuntimeEvent enum — the event bus hooking into session lifecycle:

Event Behavior
TurnStarted Captures active goal_id + token usage baseline. Plan mode skips.
ToolCompleted Accounts token + wall-clock deltas. May inject budget-limit steering.
ToolCompletedGoal Same but suppresses budget-limit steering (avoid double-reporting).
TurnFinished Final accounting, no-tool continuation suppression logic.
TaskAborted(Interrupted) Pauses active goal.
ThreadResumed Reactivates paused goal (paused → active).
MaybeContinueIfIdle Starts auto-continuation turn with continuation prompt.
ExternalMutationStarting Best-effort accounting before external set/clear.
ExternalSet { status } Apply external status (active = maybe continue, budget_limited = clear runtime state).
ExternalClear Clear runtime accounting state.

Accounting model:

Two concurrent snapshots per thread:

  1. Turn accountingGoalTurnAccountingSnapshot: tracks last accounted token usage per turn
  2. Wall-clock accountingGoalWallClockAccountingSnapshot: tracks elapsed real time

Delta is computed: current - last_accounted, then pushed to SQLite atomically. A Semaphore(1) serializes accounting updates.

Budget limit steering:

When accounting crosses the token budget, the runtime injects a budget_limiting item into the model's response stream. This steering is suppressed on:

  • The completion turn (don't tell model "out of budget" when it just finished)
  • Subsequent tool completions after the first steering (tracked via budget_limit_reported_goal_id)

Continuation suppression:

A no-tool continuation turn suppresses the next automatic continuation (avoids infinite loop). User action, tool calls, or external mutations reset suppression.


PR 5: TUI UX (#18077)

Slash command registered in SlashCommand::Goal:

Goal => "set or view the current goal for a long-running task",
  • supports_inline_args() → true
  • available_during_task() → true
  • Renders in command popup

Goal connector in chatwidget.rs:

  • Renders goal objective + status in the thread status bar
  • Handles thread/goal/updated and thread/goal/cleared notifications
  • Shows elapsed time and token usage

Continuation prompts (templates/goals/):

  • continuation.md — "Continue working toward your goal: {objective}..."
  • budget_limit.md — "You're approaching the token budget for your goal..."

Thread resume ordering:

  1. Emit goal snapshot notification
  2. Apply goal resume runtime effects (activate paused goal)
  3. Send resume response + replay
  4. Maybe continue active goal if idle

Key Design Decisions

  1. Model can't pause/resume. Only create_goal and update_goal(complete) are model-facing. Pause/resume/budget-limit are system-controlled. Pause comes from user interrupts; resume from thread re-entry; budget-limit from accounting.

  2. goal_id versioning. Every replacement generates a new UUID. Stale accounting calls from old goal versions are silently dropped. Prevents races between in-flight tool completions and user-initiated goal replacements.

  3. Atomic budget enforcement. SQL CASE statements handle budget limits inline with normal writes — no separate check-and-set race condition.

  4. Completing a goal auto-reports budget. The tool response includes a human-readable budget summary so the model naturally reports "I used 3,250 of 10,000 tokens" to the user.

  5. Auto-continuation is conservative. If a continuation turn produces zero tool calls (just chat), it suppresses the next auto-continuation. Prevents stubborn loops.

How the loop stops

The auto-continuation loop is fundamentally one-shot — it doesn't re-fire indefinitely. Stop conditions:

  1. Goal reaches terminal statuscomplete or budget_limited. No active goal = no continuation.
  2. No-tool suppression — if a continuation turn produces zero tool calls, the runtime sets continuation_suppressed = true. The next MaybeContinueIfIdle event checks this and skips. User action, tool calls, or external mutations reset it.
  3. Semaphore guardcontinuation_lock is a Semaphore(1). If a continuation is already in-flight, subsequent events bail immediately.
  4. Mode check — Plan mode ignores goals entirely. No continuation.
  5. No idle checkmaybe_continue_goal_if_idle only fires if the thread has no active turn. If the user or another agent is already busy, it's a no-op.

The runtime doesn't loop: it fires one continuation per trigger (thread resume, external set to active). If that turn does real tool work and doesn't mark the goal complete, the next continuation only comes from another explicit trigger.

@Mgkyawminthant

Copy link
Copy Markdown

@Mgkyawminthant

Copy link
Copy Markdown

Starlink
https://www.starlink.org
https://starlink no loginstarlink no cord

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment