Codex /goal Implementation

How OpenAI's Codex CLI implements the /goal slash command for persisted long-running task objectives.

The /goal command sets a persisted objective for a long-running task. It's a five-layer system: a SQLite table stores one goal per thread with status (active/paused/budget_limited/complete), optional token budget, and running usage counters. The app-server exposes thread/goal/set/get/clear JSON-RPC methods. The model sees three tools — create_goal, update_goal(complete), and get_goal — but can't pause/resume; those are system-controlled. A runtime event bus hooks into turn lifecycle to track token + wall-clock deltas for accounting, auto-pauses on interrupt, auto-reactivates paused goals on thread resume, and injects budget-limit steering into the model's response stream. The TUI handles the /goal slash command and displays goal state in the status bar.

Author: etraut-openai | 5 PRs, ~15K additions, landed in ~10 days (Apr 16–25, 2026)

Architecture: 5-PR Stack

PR 1: Persistence foundation (#18073)

Feature flag: Feature::Goals under Stage::UnderDevelopment, default-off.

SQLite schema (migration 0029_thread_goals.sql):

CREATE TABLE thread_goals (
    thread_id TEXT PRIMARY KEY NOT NULL REFERENCES threads(id) ON DELETE CASCADE,
    goal_id TEXT NOT NULL,
    objective TEXT NOT NULL,
    status TEXT NOT NULL CHECK(status IN ('active', 'paused', 'budget_limited', 'complete')),
    token_budget INTEGER,
    tokens_used INTEGER NOT NULL DEFAULT 0,
    time_used_seconds INTEGER NOT NULL DEFAULT 0,
    created_at_ms INTEGER NOT NULL,
    updated_at_ms INTEGER NOT NULL
);

4 statuses:

active — in progress, system accounts usage
paused — stopped by user/system, usage not tracked
budget_limited — token budget exhausted (terminal, but in-flight usage still accounted)
complete — objective achieved (terminal)

State runtime APIs (in state/src/runtime/goals.rs):

get_thread_goal — read current goal
replace_thread_goal — upsert with new goal_id (resets usage)
insert_thread_goal — create-only (ON CONFLICT DO NOTHING)
update_thread_goal — partial status/budget update
pause_active_thread_goal — status='active' → 'paused'
delete_thread_goal — remove
account_thread_goal_usage — atomically add time + tokens, auto-set budget_limited

Key design: stale update protection. Each replacement generates a new goal_id UUID. Callers pass expected_goal_id; if the current goal_id doesn't match, the update is silently ignored. This prevents an old, in-flight accounting call from clobbering a newly replaced goal.

Budget auto-enforcement via SQL CASE statements: setting token_budget on an active goal whose tokens_used already exceeds the limit immediately transitions to budget_limited.

PR 2: App-server API (#18074)

Three experimental JSON-RPC methods on the app-server protocol:

Method	Purpose
`thread/goal/set`	Create, replace, or update goal. New `objective` = replace (reset usage). Same non-terminal objective = update (preserve usage).
`thread/goal/get`	Fetch current goal. Returns `goal: null` when none.
`thread/goal/clear`	Delete the goal. Returns `cleared: bool`.

Two experimental server notifications:

Method	Triggers
`thread/goal/updated`	Any goal change; includes full `ThreadGoal` + optional `turnId`
`thread/goal/cleared`	Goal removed; includes `threadId`

Protocol types (ThreadGoal):

{
  "threadId": "thr_xxx",
  "objective": "...",
  "status": "active",
  "tokenBudget": 200000,
  "tokensUsed": 0,
  "timeUsedSeconds": 0,
  "createdAt": 1776272400,
  "updatedAt": 1776272400
}

Replacement semantics via thread/goal/set:

{ objective: "new goal" } → replaces goal, resets usage to zero
{ objective: "existing goal" } → same objective = update status/budget, preserve usage
{ status: "paused" } → pause existing goal
{ tokenBudget: 50000 } → change budget (may immediately budget-limit)
tokenBudget: null → remove budget; tokenBudget omitted → don't change

Marked as #[experimental("thread/goal/set")] etc. — only available when feature flag is enabled.

PR 3: Model tools (#18075)

Three tools exposed to the model (not the full action space — intentional asymmetry):

Tool	Args	Behavior
`create_goal`	`{ objective, token_budget? }`	Fails if goal already exists; creates new active goal
`update_goal`	`{ status: "complete" }`	Can only mark complete; pause/resume/budget-limit are system-controlled
`get_goal`	none	Returns current goal or null

Design principle: The model can start a goal and declare it complete, but pause/resume/budget transitions are controlled by the user or the system runtime. The tool spec explicitly says:

"Create a goal only when explicitly requested by the user or system/developer instructions; do not infer goals from ordinary tasks."

Tool response includes a completion_budget_report when a budgeted goal is marked complete:

{
  "goal": { ... },
  "remainingTokens": 6750,
  "completionBudgetReport": "Goal achieved. Report final budget usage to the user: tokens used: 3250 of 10000; time used: 75 seconds."
}

Each tool handler delegates to Session methods that validate, persist, emit events, and update runtime accounting state.

PR 4: Core Runtime (#18076)

The goal lifecycle engine. Lives in core/src/goals.rs.

GoalRuntimeEvent enum — the event bus hooking into session lifecycle:

Event	Behavior
`TurnStarted`	Captures active goal_id + token usage baseline. Plan mode skips.
`ToolCompleted`	Accounts token + wall-clock deltas. May inject budget-limit steering.
`ToolCompletedGoal`	Same but suppresses budget-limit steering (avoid double-reporting).
`TurnFinished`	Final accounting, no-tool continuation suppression logic.
`TaskAborted(Interrupted)`	Pauses active goal.
`ThreadResumed`	Reactivates paused goal (paused → active).
`MaybeContinueIfIdle`	Starts auto-continuation turn with continuation prompt.
`ExternalMutationStarting`	Best-effort accounting before external set/clear.
`ExternalSet { status }`	Apply external status (active = maybe continue, budget_limited = clear runtime state).
`ExternalClear`	Clear runtime accounting state.

Accounting model:

Two concurrent snapshots per thread:

Turn accounting — GoalTurnAccountingSnapshot: tracks last accounted token usage per turn
Wall-clock accounting — GoalWallClockAccountingSnapshot: tracks elapsed real time

Delta is computed: current - last_accounted, then pushed to SQLite atomically. A Semaphore(1) serializes accounting updates.

Budget limit steering:

When accounting crosses the token budget, the runtime injects a budget_limiting item into the model's response stream. This steering is suppressed on:

The completion turn (don't tell model "out of budget" when it just finished)
Subsequent tool completions after the first steering (tracked via budget_limit_reported_goal_id)

Continuation suppression:

A no-tool continuation turn suppresses the next automatic continuation (avoids infinite loop). User action, tool calls, or external mutations reset suppression.

PR 5: TUI UX (#18077)

Slash command registered in SlashCommand::Goal:

Goal => "set or view the current goal for a long-running task",

supports_inline_args() → true
available_during_task() → true
Renders in command popup

Goal connector in chatwidget.rs:

Renders goal objective + status in the thread status bar
Handles thread/goal/updated and thread/goal/cleared notifications
Shows elapsed time and token usage

Continuation prompts (templates/goals/):

continuation.md — "Continue working toward your goal: {objective}..."
budget_limit.md — "You're approaching the token budget for your goal..."

Thread resume ordering:

Emit goal snapshot notification
Apply goal resume runtime effects (activate paused goal)
Send resume response + replay
Maybe continue active goal if idle

Key Design Decisions

Model can't pause/resume. Only create_goal and update_goal(complete) are model-facing. Pause/resume/budget-limit are system-controlled. Pause comes from user interrupts; resume from thread re-entry; budget-limit from accounting.
goal_id versioning. Every replacement generates a new UUID. Stale accounting calls from old goal versions are silently dropped. Prevents races between in-flight tool completions and user-initiated goal replacements.
Atomic budget enforcement. SQL CASE statements handle budget limits inline with normal writes — no separate check-and-set race condition.
Completing a goal auto-reports budget. The tool response includes a human-readable budget summary so the model naturally reports "I used 3,250 of 10,000 tokens" to the user.
Auto-continuation is conservative. If a continuation turn produces zero tool calls (just chat), it suppresses the next auto-continuation. Prevents stubborn loops.

How the loop stops

The auto-continuation loop is fundamentally one-shot — it doesn't re-fire indefinitely. Stop conditions:

Goal reaches terminal status — complete or budget_limited. No active goal = no continuation.
No-tool suppression — if a continuation turn produces zero tool calls, the runtime sets continuation_suppressed = true. The next MaybeContinueIfIdle event checks this and skips. User action, tool calls, or external mutations reset it.
Semaphore guard — continuation_lock is a Semaphore(1). If a continuation is already in-flight, subsequent events bail immediately.
Mode check — Plan mode ignores goals entirely. No continuation.
No idle check — maybe_continue_goal_if_idle only fires if the thread has no active turn. If the user or another agent is already busy, it's a no-op.

The runtime doesn't loop: it fires one continuation per trigger (thread resume, external set to active). If that turn does real tool work and doesn't mark the goal complete, the next continuation only comes from another explicit trigger.

patleeman/README.md

Select an option

No results found