Skip to content

Instantly share code, notes, and snippets.

@usirin
Created April 12, 2026 04:36
Show Gist options
  • Select an option

  • Save usirin/4aa990ede7192245bb642cfad4a34767 to your computer and use it in GitHub Desktop.

Select an option

Save usirin/4aa990ede7192245bb642cfad4a34767 to your computer and use it in GitHub Desktop.
Parallelism-Preserving UI for Agentic Work — mixed-audience research survey
title Parallelism-Preserving UI for Agentic Work
date 2026-04-11
type research
status draft
variant mixed-audience

What This Document Is

A research survey for a mixed team -- five engineers, four designers -- exploring what it looks like to supervise autonomous agents doing product development work. The premise is settled ground from prior research 12: chat interfaces flatten parallel, branching, long-running agent work into serial transcripts, and the result is that humans lose the ability to supervise effectively. The question here is not whether chat fails but what replaces it, and specifically what replaces it for a team where half the people think in spatial compositions and half think in data structures.

The core finding across roughly 160 surveyed sources is convergence. Five mature infrastructure domains -- Kubernetes controllers, workflow orchestrators, version control, reactive notebooks, Erlang/OTP supervision trees -- independently arrived at the same architectural shape for "supervise a declarative, long-running, parallel system." The vocabulary differs. The shape does not. And no shipping agent UI has adopted it yet.

The second finding is that the most underexplored territory is not visualization technique. The visualization patterns are well-settled. The gaps are in three specific contracts: level-triggered reconciliation from Kubernetes, supervision-tree semantics from Erlang/OTP, and CRDTs as the substrate for treating agents as multiplayer collaborators. Each is backed by decades of production infrastructure. Each has near-zero adoption in agent UX.

What Does Our Own Product Already Have?

Discord has been running the vocabulary for agent supervision in production for five years, and nobody in the agent-UX space has noticed 3.

Start with the thing a designer touches every day: Rich Presence. That status card on a Discord profile -- the one showing what game someone is playing, what song they are listening to, what IDE they have open -- is a state-projection UI 4. Every field is a snapshot of process state; nothing streams. The details field (128 characters) maps to "what the agent is doing." The state field maps to "where in the state machine." The party.size tuple -- a literal [current, max] -- renders "task 3/7" natively in the Discord client with zero custom UI. The timestamps fields cause Discord to auto-render an elapsed or remaining counter. The small-image overlay can encode health status using color. An engineer's Discord profile becomes a live operator dashboard, and the build is two days: fork any VSCode discord-rpc extension 5, replace "current file" with the current task.

Forum channels are the most parallelism-preserving container Discord ships, and the team in the room built them 6. Discord added forums in September 2022 after admitting that text channels flatten parallel conversations 7. Each post is a first-class thread with its own title and top message. Available tags -- the available_tags field on the channel, applied_tags on each post -- are a fixed, channel-scoped taxonomy. This is the same pattern as design tokens constraining agent output: a closed set of labels, enforced at the platform level. Tags map to the task-status enum. Thread archival via auto_archive_duration is the "archive, don't delete" pattern. Per-post subscription is per-task notification control. A forum channel as a task board is a weekend build on existing infrastructure.

Stage channels are the cleanest human-in-the-loop gate Discord ships 8. Three roles -- moderator, speaker, audience -- map onto the supervision hierarchy. The raise-hand queue is a shipped approval-gate UI. Voice channel tiles are already a multi-agent parallelism view: one tile per participant with fixed positions (no reflow), a voice-activity-detected ring that glows during transmission, and mute/deafen/camera icons as status glyphs 9. The design insight here is spatial stability -- tiles do not rearrange when someone joins or leaves. That positional invariance is load-bearing for supervision at scale, and Discord already enforces it.

The Embedded App SDK 10 enables shipping a web application inside an iframe inside a Discord voice channel, with shared state across all participants. The Listening-Along pattern from Spotify integration 11 -- a button on someone's profile that synchronizes your client to their playback position -- is the template for co-supervision: click, and your dashboard opens to your teammate's current run, scoped to their current task.

The prior-art scorecard across nine surveyed "agent in Discord" projects (MidJourney, various bridge bots, llmcord) tells the story of missed opportunity 3. Eight Discord primitives times nine projects yields 72 possible cells. Exactly 11 are populated, all in the same column (edit-in-place embed). No project uses Rich Presence. No project uses forum channels as the task substrate. No project uses Stage channels. No project uses voice tiles as a swim-lane. MidJourney's departure from Discord in August 2024 12 is evidence: they left precisely because nobody built the state-projection surface that Discord's own primitives would support. That exit is the hole these primitives are shaped to fill.

Clyde's auto-thread pattern, shipped between March and December 2023, is worth noting because Clyde's own team recognized in 2023 that flattening AI replies into the main channel stream is the wrong default 13. The anti-flattening impulse is already in the product's history.

What Vocabulary Do Other Fields Use for This?

The question "how do you build a UI for supervising autonomous processes" has been answered independently by at least five infrastructure domains. The surprising finding is not that answers exist but that they are the same answer wearing different clothes.

Kubernetes calls it the spec/status split 14: you declare what you want (spec), the system continuously reconciles toward it, and the UI shows you the gap between desired and observed. ArgoCD adds a second axis, separating sync status ("does observed match desired?") from health status ("is the thing actually working?") 15. These axes are independent -- an application can be synced but degraded, or out-of-sync but healthy. Collapsing them into a single status badge, as most agent dashboards do, hides half the information.

Terraform formalized the vocabulary of change as a first-class object 16. A plan is a reviewable change set, proposed but not executed. It is a data structure, not a side effect. Apply reifies a plan. Drift is the gap between declared state and actual state when something mutates out of band. The diff symbols -- + create, - destroy, ~ update in-place, -/+ replace -- are already a mini design system every engineer recognizes by muscle memory.

CQRS and event sourcing separate the write model from arbitrarily many read models 17. The write side records events. The read side is a materialized view optimized for a specific query. The payoff: one event log supports many simultaneous projections -- a task list, a timeline, a dependency graph, a swim-lane diagram. Each projection is a pure function of the log. They are all the same truth, all recomputable at any time.

Component-tree inspectors from React DevTools to Flutter Inspector 18 treat the running application as a navigable, searchable, hover-to-highlight tree. They are pure read-only projections over live state.

The Erlang/OTP supervision tree 19 is the most battle-tested model in the industry for a hierarchical set of long-lived, fallible, restartable workers. A supervisor does nothing but start, monitor, and restart its children; a worker does work. Restart strategies are named policies picked from a menu -- one_for_one, one_for_all, rest_for_one -- rather than reimplemented for each failure scenario. Max restart intensity is the circuit-breaker primitive: "if a child crashes more than N times in T seconds, escalate." The observer GUI, shipped for 15 years, renders the live supervision tree as an interactive graph 20.

What matters for the mixed team is not the implementation details of any one system but the convergence: every domain arrived at append-only event logs as the source of truth, pure read-only projections as the UI, a strict refusal to let the UI become a write-through store, and multiple simultaneous lenses over the same underlying state.

The design consequence is that picking one visualization for an agent dashboard is the wrong move. The correct move is picking two or three projections over the same state, because that is what every mature domain discovered independently.

What Does Branching Actually Look Like as an Interface?

Open Figma and create a component variant. Branch your file. Add a variant in the branch. Now think about what happened structurally: you forked a point in time, diverged, and eventually you will merge or discard. This is the same structure underneath every version-control system, every branching conversation UI, and every multi-response agent interaction. The question is how to make that tree visible and navigable rather than hidden behind arrow-nudge buttons.

The mature vocabulary comes from four lineages 21: git DAG renderers (forensic branching -- what happened?), reactive dataflow notebooks (dependency branching -- what depends on what?), the Loom family of multiverse writing interfaces (exploratory branching -- what could happen?), and design-tool branching (creative branching -- what looks better?). An agent operator needs all four at once.

Loom is the only interface that has ever taken LLM branching fully seriously 22. Built in 2020 because AI Dungeon's linear interface could not express navigating a probabilistic multiverse of continuations, Loom introduced vocabulary everyone else later reinvents: multiverse (the full tree), world tree (its visual rendering), hoist/unhoist (focus on a subtree, collapse everything outside it), ground truth trajectory (the chosen path rendered in black, unchosen siblings in grey). Its two-axis keyboard model -- sequential navigation (next/previous node) orthogonal to structural navigation (parent/child/sibling) -- is a move every tree UI should steal. The current message exists in two spaces simultaneously: the reading-order timeline and the parent-child tree.

LibreChat ships the most production-ready forking UI in any open-source LLM client 23. Its contribution is naming three discrete fork modes: Direct Path Only (just the linear visible messages), Include Related Branches (the path plus branches attached along the way), and Include All to/From Here (every message, visible or not). This names three distinct user intents under "fork" and lets the user choose without inventing new words.

The Claude Code community spec (issue #32631) 24 is the single most useful artifact found in the branching survey. It defines fork, branch, sibling, checkpoint, rewind, switch, and merge as distinct named operations with distinct semantics. The most novel idea: merge is explicitly not replaying tool calls. It is a compacted context summary injected as a system message. Structural three-way merge of prose does not work because two branches of a conversation are not textually recombinable 25. This is honest about a hard constraint that every other tool ignores.

Cursor's checkpoints 26 are instructive primarily as a failure case. Cursor zips the pre-change state into a local checkpoint whenever the agent modifies code, and "Restore Checkpoint" reverts to that snapshot. But it does not branch on restore -- it is destructive. Forum threads are full of users who lost work because restore-is-not-branch is exactly the failure mode that breaks first when "fork from past turn" is not first-class.

Reactive notebooks -- Marimo, Observable, Hex 2728 -- contribute the sharpest insight linking branching to parallelism. Marimo models a notebook as a directed acyclic graph where edges represent data dependencies. Hex 3.0 leverages the dependency graph to support parallel cell execution. The insight: once you have the DAG, parallel execution falls out of topological sort. Branching and parallelism are dual -- a branch you have not merged yet is a parallel subgraph. Marimo's "stale" primitive, where dependent cells are marked stale rather than auto-rerun, maps directly to agent QA: a QA report is stale when its target task has been re-run since the report was written.

Eight failure modes emerge when branching becomes first-class 24222326: the active-branch pointer must persist across refreshes; rewind and fork must be distinct operations with separate confirmation flows; search breaks when conversation structure is a DAG instead of a list; merge is fundamentally lossy for prose; garbage collection of abandoned branches is a political question; "the most recent message" stops being meaningful when latest is scoped to a branch; keyboard navigation needs two axes; branch names are the new file names.

What Would a Design-Systems Team Specifically Bring to This?

A backend engineer's take on agent supervision ships a read model. A design-systems engineer's take ships a vocabulary. The difference is the difference between "here are the facts about your agent" and "here is a component contract the agent cannot violate, a layout that survives non-determinism, and a presence grammar that tells you where the agent is looking."

Presence primitives are the most direct and most underused borrow in agent UX. Figma's multiplayer cursors 29, Liveblocks' useMyPresence/useOthers hooks 30, Linear's colored avatar rings, Notion's ghosted-avatar facepile for "recently present" 31 -- all implement the same grammar: identity (avatar plus color plus name), location (current file or task or node), and activity (short verb: reading, thinking, writing, blocked). The agent is just another peer in the multiplayer room. Figma's cursor chat -- an empty speech bubble that appears next to your cursor, the message disappears after five seconds 32 -- is the right model for ephemeral agent narration. "The agent is thinking about task 4, says 'checking if the schema matches.'" No scrollback, no log panel; the narration lives at the cursor and evaporates. The agent becomes a collaborator with a presence, not a process in a monitoring panel.

Design tokens become the agent's legal output alphabet. The W3C Design Tokens Format Module reached its first stable version in October 2025 33, establishing a vendor-neutral JSON format with validated type constraints. Vercel's json-render framework 34 implements the exact move: developers define a catalog of permitted components and actions using Zod schemas, an LLM generates a JSON specification constrained to that catalog, and the framework renders the output progressively. This is the reference implementation for "design system as agent constraint." The shadcn/ui CSS-variable bus 35 -- where every component draws from the same CSS variables like --primary, --background, --foreground -- is the coordination layer. Radix Colors' twelve-step perceptually-uniform scales 36 are the correct palette substrate for status encoding: steps 9-10 for active agents, 3-4 for idle, 11-12 for text.

Spatial memory is the primary UX lever for long-running multi-agent supervision, and no shipping agent UI uses it. Nielsen-Norman Group research demonstrates that the ability to recall the location of controls is essential for power users 37. This is why Linear is fast, why Raycast is fast, why k9s is fast: the layout does not move. The design-systems rule is that the chrome must be boring on purpose so the non-determinism has somewhere to land. The invariant-layout pattern -- sidebar 240-280px, top bar, content grid, right rail, nothing moves ever 38 -- is a copyable rule set. Dashboard reflow (when the dashboard rearranges its own layout in response to state changes) is the named anti-pattern.

What backend dashboards get wrong, per the surveyed design-engineering literature: color without a ramp (Grafana and Datadog ship raw hex 39), typography as decoration instead of hierarchy (Refactoring UI's canonical rebuttal: "relying too much on font size; try color or font weight instead" 40), motion without meaning (Material Design 3 names status as a first-class use case for motion 41), and confusing skeletons with spinners 42.

A new primitive proposed by this research: the reconciliation heartbeat -- a visible pulse indicating "the loop is alive, the world was observed at T minus 3 seconds, the diff is empty, all is well." Green heartbeat is a whole category of "absence of change is the signal" that Kubernetes has shipped the UX for but agent UIs have not. Discord's voice-activity-detected ring -- the glow around a user's avatar when they transmit -- is already this primitive, shipped as an avatar decoration.

Storybook's MCP server 43 and Component Manifests -- JSON objects describing the contents of a Storybook in a concise, structured way for AI agents -- are the existing machine-readable agent-task-catalog format. A task template is a story (with args, a play function, decorators). A running task is a story in the Canvas tab. The task catalog is the Storybook sidebar. Agent skill libraries compose via Storybook's composition mechanism. The design-systems engineer is the only cohort for whom this vocabulary is already machine-readable for LLMs in production.

The voices that define the intersection -- Karri Saarinen at Linear 44, Guillermo Rauch at Vercel 45, Rauno Freiberg 46, Emil Kowalski 47, Josh Comeau 48, Bret Victor 49 -- converge on one principle: the feedback loop between decision and consequence must be instantaneous and continuous. Chat interfaces break this loop. Projection interfaces restore it.

How Do Other Tools Visualize Parallel Work?

Every shipping workflow and agent orchestration tool has converged on a small set of visual shapes. Surveying LangGraph Studio, Temporal, Airflow, Dagster, Prefect, Kestra, Argo Workflows, Buildkite, GitHub Actions, Cursor 3, Devin, Replit Agent 4, CrewAI, AutoGen Studio, Inngest, LangSmith, and Langfuse reveals eight recurring shapes 505152535455565758.

The DAG graph is the most universal: nodes plus edges, live-updated, nodes colored by state. Every orchestrator ships this. Airflow's Grid view 51 is the canonical at-a-glance health surface: a matrix where columns are runs, rows are tasks, and cells are colored by state. The Gantt chart is the temporal lens -- horizontal bars per task against a real time axis, shipped by Kestra, Prefect, and Temporal's Timeline view. The kanban board, shipped by Replit Agent 4 57 with columns Drafts/Active/Ready/Done, maps the state machine onto spatial columns. The trace tree from LangSmith and Langfuse 59 nests root spans with children, indented or tree-shaped. The swim lane from BPMN separates horizontal lanes per agent or owner. The tabs grid from Cursor 3's Agents Window 56 arranges up to eight parallel agent sessions as tiles -- side-by-side, grid, or stacked. The sidebar tree from Devin 58 nests parent and child sessions in a left navigation pane.

Each shape is a different projection of the same underlying state. The actionable guidance is to pick two or three lenses for a demo, not one.

Temporal's vocabulary is the most precise in the space 50. Workflow, Run, Event History, Activity, Pending Activity, Signal, Reset, Continue-as-new, Child Workflow. Its August 2024 redesign split Event History into two views: Compact (no clock time, just ordering) and Timeline (horizontal time axis, real duration). Same state, two lenses.

Airflow's task-instance state palette 51 has been stable for approximately six years and has been copied by every orchestrator since: queued (gray), running (lime), success (green), failed (red), up_for_retry (gold), up_for_reschedule (turquoise), upstream_failed (orange), skipped (pink), deferred. Nine core colors, all pastel-saturated. The upstream_failed state -- when a parent fails, children propagate an explicit orange distinct from their own red -- is circuit-breaking as a color. Prefect adds the critical distinction between failed (code-level), crashed (infrastructure killed the run), paused (waiting for human approval), and cancelled (user stopped it) 53. Code failure and infrastructure failure must be different words.

LangGraph Studio 60 is the only production agent IDE that makes fork-from-past-step first-class. It persists a checkpoint at every node. Rewinding creates a new checkpoint that branches; the original execution history stays intact. This is the orchestration-flavored analog of the Claude Code fork/branch/checkpoint vocabulary from the branching survey.

Cursor 3's Agents Window 56, shipped April 2026, is the parallelism-preserving IDE that already exists. Agent tabs are arrangeable side-by-side, in a grid, or stacked. Up to eight agents run in parallel across isolated Git worktrees. The Best-of-N pattern -- select multiple models, each produces a solution in an isolated worktree, results appear side-by-side -- is the "multiple responses from a single prompt" feature, already shipped in a mass-market tool.

Devin's plan-review gate 58 is the cleanest human-in-the-loop primitive. Each session starts with Devin surfacing relevant files and a preliminary plan. The user edits the plan before autonomous work begins. This is Endsley's SA Level 3 -- projection of future state 61 -- applied to agent plans, and it is the simplest way to prevent the out-of-the-loop problem that degrades human intervention quality during long autonomous runs 62.

The gap nobody has claimed: no surveyed tool combines the Grid view (state over history) with the swim-lane view (cross-agent) simultaneously. No surveyed tool exposes rewind-plus-fork as a button on any cell in the Grid view. These are unclaimed intersections in territory where every individual shape is well-settled.

What Concepts Have Not Been Borrowed Yet?

The agent-UX community is reinventing vocabulary that control theory, distributed systems, local-first research, and human factors have been polishing for decades. The prior sections mined the surface. The deeper layer is that these are not just borrowable names but borrowable design contracts -- each carrying a theorem about what you get for free once you adopt the shape.

CRDTs are the biggest unclaimed borrow from a design perspective 636465. Operational Transformation and CRDTs solved a problem agent UX is stumbling into without realizing it: how do multiple actors edit the same document concurrently, offline-tolerant, without a central lock, and converge to the same state? Replace "multiple users" with "multiple agents plus humans plus the filesystem watcher" and the problem is identical. Jazz's CoValues 66, Yjs shared types 64, and Automerge documents 63 are data structures whose mutations automatically propagate and merge, with eventual consistency by construction, full edit history signed per author, and offline-first sync.

The hidden payoff: CRDTs are naturally branching data structures. Every branching concern from the earlier section -- fork from turn N, sibling conversations, non-destructive rewind -- is trivially expressible as CRDT forks. Ink & Switch's Peritext 67, a CRDT for rich text with inline marks now implemented in Automerge, enables multiple agents to concurrently annotate the same prose without clobbering each other's marks. Nobody has shipped multi-agent collaborative writing on Peritext. Patchwork 68, also from Ink & Switch, pursues universal version control as a kernel service -- branches, edits, merges, comments, review -- applied across arbitrary apps. The agent analog: every agent-produced artifact is automatically versioned, branchable, mergeable, and commentable without the agent knowing.

OTP supervision trees are the biggest unclaimed borrow from an engineering perspective 19. Erlang's split between supervisors (who only start, monitor, and restart children) and workers (who do work) is a moral commitment. The restart strategy menu -- one_for_one, one_for_all, rest_for_one -- names failure-recovery policies rather than reimplementing them per scenario. Max restart intensity is the circuit breaker. "Isolation is the foundation. Supervision is the workflow" 69. Nobody in agent UX draws the multi-agent diagram as a supervision tree, despite it being the actual shape of a long-running agent orchestrator.

Time-travel debugging has been a stable primitive set since Redux DevTools 70: an action log, a timeline scrubber, pin-to-substate, export/import state. LangGraph Studio ships time travel for agent orchestration 60, but no consumer agent UI does. Replay.io's product -- capture a session, send a URL, the recipient scrubs the event log 71 -- is the shape an agent debugging surface should take. Capture an agent run as a bundle of state files and cached LLM responses, send a URL to a teammate, they scrub the trajectory to find where it went wrong. This is Redux DevTools applied to agent runs. It is viscerally better than reading terminal logs.

The digital twin metaphor from IoT and industrial engineering 72 transfers cleanly. A digital twin is a virtual replica continuously updated with live sensor data, used to monitor, simulate, and predict. Applied to agents: the UI maintains a live, queryable, simulatable model of the agent's world in parallel with the agent itself. The user can scrub it (time travel), simulate it ("what if the agent picks option B" without affecting the real run), diff it versus the real system (drift detection), and project forward ("if this trajectory continues, task 5 will be blocked in three minutes"). That last capability -- forward projection -- is Endsley SA Level 3 61 made concrete, and nobody in agent UX has shipped it.

HCI supervision frameworks 617374 extend to multi-agent work with unfinished edges. Team situation awareness from Endsley and Jones 75 decomposes SA into shared SA (what everyone needs) and individual SA (what each role needs), implying that different viewers should get different projections of the same event log. Sheridan's ten-level automation scale 73 applied per-task rather than per-run produces a visible badge: "this task is L8, you only hear about it at DONE; this task is L4, you approve before execution." Lee and See's trust calibration 74 applied per-agent produces a reliability-history widget: "agent A succeeded on 14 of 15 refactoring tasks, failed on 3 of 7 test-writing tasks." No agent UI has shipped per-agent reliability history as a first-class trust primitive.

The State-Projection Contract (The Engineering Foundation)

For the engineers in the room who want the architectural skeleton underneath the design vocabulary: here is the contract that makes everything above possible.

k9s 76 is the sharpest reference for what an agent supervision UI should do. It is a stateless, queryable, keyboard-driven projection over a declarative system that refuses to be a database. Command mode (: to switch resource views), search mode (/ scoped to current view), xray view (:xray for a hierarchy tree), hotkey system (hotkeys.yaml for saved queries), help overlay (? for context-aware keybindings). The anti-pattern k9s deliberately avoids: it does not own state. Even edit and delete delegate to the API server. The UI is pure projection.

The Kubernetes controller contract matters because it carries a philosophical commitment most agent frameworks silently violate: the distinction between level-triggered and edge-triggered reconciliation 77. An edge-triggered system reacts to changes. A level-triggered system reacts to the current state of the world regardless of how it arrived there. Kubernetes is level-triggered by constitutional law. Every Slack bot and Discord bot is edge-triggered by default. The consequence: running the same reconciliation loop twice with the same input must produce the same result with no additional side effects. If the filesystem already shows "done," the reconciler must detect that and short-circuit. Every controller tutorial says this; every "agent loop" tutorial forgets it.

ArgoCD's two independent status axes 15 deserve their own mention. Sync status: does the observed state match desired state? Health status: is the thing actually working? Collapsing these into a single column hides half the information. ArgoCD's aggregation rule -- the parent's health is the worst health among its children, evaluated by a defined priority ordering -- is the algorithm for rolling up task status to run status to project status. Define the enum once and apply a min() everywhere.

The CQRS framing 17 elevates all of this from naming conventions to a technical program. The write model is the command API -- verbs of intent like "start task" and "block task." The event store is the filesystem -- every write to a progress log is an append-only event. The read model is the UI, a materialized view optimized for query and never for write. Marten's live-versus-async projection split 78 is the implementation lever: live projections are computed at query time (always fresh), async projections are maintained as a background index (fast but eventually consistent). Start with live, graduate to async when cardinality demands it.

Obsidian's graph view 79 is worth noting because it is a projection that already runs over the same filesystem an agent operator writes to. Its graph -- one node per file, one edge per wikilink -- is a materialized view over markdown. Its color groups map directly to task status encoding. The question of whether an existing tool can serve as the initial demo surface answers itself: the plumbing already exists.

No surveyed agent UI ships a drift panel. No surveyed agent UI enforces the write-through refusal discipline. No surveyed agent UI displays two independent status axes. These are not novel inventions. They are decade-old patterns the agent-UX community has not yet adopted.

What Would This Look Like in Five Years?

Five endgame metaphors for agentic product development compete, and no single one wins 80. They win at different scales.

At the turn level (one prompt, many branches), the multiplayer-document metaphor wins. Branching is CRDT-native, human and agents are peers in one room.

At the task level (one task, multiple subagents), the spreadsheet or reactive-notebook metaphor wins. The dependency DAG is the shape of the work, and parallelism falls out of topological sort.

At the workflow level (many tasks, many agents), the real-time strategy game metaphor wins. It is natively multi-agent, natively parallel, natively spatial, natively about supervising autonomous entities. Screeps 81 already demonstrates that programmers will write JavaScript to supervise a persistent colony of autonomous units running 24/7.

At the kernel level (what crashes, what restarts, who owns what), OTP supervision trees are the correct shape.

At the 2030 endgame, Dynamicland's room-scale computational public space 82 is the right North Star -- the room is the computer, programs are physical objects, collaboration is default, spatial memory is load-bearing. But it is not a two-week demo.

The "no chat at all" endgame has at least seven concrete replacements for the prompt box 80. The command palette -- Cmd-K as a verb catalog: /fork, /reconcile, /pause, /rewind to turn N 83. Direct manipulation of the state file -- edit workflow-state.json in your editor; the operator detects drift and reconciles. Voice -- Whisper plus intent routing plus the state ledger as execution surface; voice is the one input channel that does not serialize through fingers. The file you are looking at as the prompt -- Ink & Switch's Embark pattern 84: write a new task line in tasks.md, the agent notices drift, claims the task, reconciles; there is no "submit." Ambient notification -- Rich Presence plus tray icon plus a daily digest and nothing else; Jason Yuan's Dot/new.computer pitch 85 applied to operators. Calendar as agent log -- every task-run is a calendar event, color-coded using Airflow's palette; time is already the primary axis of work. Audio as ambient status -- Peep, the Network Auralizer from USENIX 2000 86, mapped network events to birdsong; the agent version makes the reconciliation heartbeat a soft metronome tick, a task transition a chime, a circuit breaker opening a heavy door closing.

Three North Star renders for the 2030 demo:

The War Room. A hex-and-counter wargame map of the vault. Agents as counters, tasks as hexes, fog-of-war over unexplored parts of the codebase. Teammates on the other side of the table supervising their own theaters.

The Atelier. Ink & Switch malleable documents with Peritext/Automerge CRDT substrate, multiplayer cursors, no "agent panel" because the agent is in the document. The Geoffrey Litt 87 and Maggie Appleton 88 endgame where home-cooked software meets malleable software.

The Command Deck. The bridge of a ship. Subagents as NPCs carrying out orders. Commands by voice or touch. The one metaphor simultaneously spatial, multi-agent, voice-first, presence-first, and gamer-legible.

The operator's unfair advantage in any of these scenarios is the substrate choice. The vault is a filesystem of pure artifacts 1. That is the substrate every 2030-shaped surface wants. Malleable software wants it. Peritext wants it. Dynamicland's Realtalk wants it. Chat UIs actively fight against it by storing state in an opaque, non-addressable, non-queryable blob. The 2030 question is not "what replaces the prompt box" but how many projections over the same event log the supervisor wants today. The projections are cheap. The substrate is ready.

What We Do Not Know Yet

The research does not resolve the following.

Merge semantics for prose remain unsettled. The Claude Code spec, Forky, and ContextBranch each name the problem -- summary injection, semantic three-way merge, inject -- but none ship a production merge algorithm for natural-language branches 242589. Whether this is solvable, or whether merge of prose is fundamentally lossy at a level no algorithm can bridge, has direct implications for any branching agent UI.

Drift detection applied to agent-produced artifacts has no shipped implementation. Terraform and Flux ship drift detection for infrastructure state 1690, but no agent UI applies the pattern to agent outputs: "the human edited progress.md manually; the state ledger does not know; the drift is invisible." Whether the reconciliation heartbeat and drift panel are worth the build cost depends on how frequently agents and humans both touch the same artifacts, which is an empirical question.

Trust calibration for multi-agent supervision lacks any shipped widget. Lee and See's framework 74 predicts that humans will both overtrust and undertrust in the absence of per-agent reliability data. No surveyed agent UI displays per-agent success rates or historical reliability as a first-class primitive. Whether a reliability-history dashboard actually improves trust calibration in practice is a testable HCI claim that has not been tested.

Whether spatial is a serious direction or an aesthetic gesture is unresolved. The research surfaced strong theoretical arguments for spatial memory 378249 and strong practical arguments that spatial interfaces are expensive to build relative to list-based alternatives. The answer likely depends on agent count: spatial memory matters more as the number of supervised entities grows, but the breakpoint is unknown.

How to evaluate which projection surface to build first is not answered. The territory has been mapped but the map does not include a selection criterion. The calibration variables: build cost (Rich Presence is two days; an Activity SDK dashboard is two weeks; a CRDT-backed Atelier is four weeks), audience legibility (calendar requires zero explanation; a supervision tree requires a one-minute preamble), and alignment with the team's existing vocabulary (Discord primitives require no translation for the target audience; OTP supervision vocabulary requires translation for anyone who has not written Erlang).

Sources

Footnotes

  1. Operator visualization framework grilling session. grill/2026-04-11-operator-visualization-framework.md. Vault-local reference establishing the "state, not stream" premise. 2

  2. Prior research on agentic interaction design systems. kampus/agentic-interaction-design-systems/research.md. HCI supervision frameworks, chat-as-failure-mode, role-native mediums, Endsley/Sheridan/Lee-See citations.

  3. Surveyed projects: ebibibi/claude-code-discord-bridge (https://github.com/ebibibi/claude-code-discord-bridge), thcapp/claude-discord-bridge (https://github.com/thcapp/claude-discord-bridge), DoBuDevel/discord-agent-bridge (https://github.com/DoBuDevel/discord-agent-bridge), llmcord (https://github.com/jakobdylanc/llmcord), OoriData/Discord-AI-Agent (https://github.com/OoriData/Discord-AI-Agent), and others. 2

  4. Rich Presence -- Discord Developer Portal. https://docs.discord.com/developers/platform/rich-presence ; Setting Rich Presence -- Discord Social SDK. https://docs.discord.com/developers/discord-social-sdk/development-guides/setting-rich-presence ; discordpp::Activity Class Reference. https://discord.com/developers/docs/social-sdk/classdiscordpp_1_1Activity.html

  5. Discord Presence VSCode extensions: LeonardSSH/vscord (https://marketplace.visualstudio.com/items?itemName=LeonardSSH.vscord), iCrawl/discord-vscode (https://github.com/iCrawl/discord-vscode).

  6. Forum Channels FAQ. https://support.discord.com/hc/en-us/articles/6208479917079-Forum-Channels-FAQ ; Forum Channels blog. https://discord.com/blog/forum-channels-space-for-organized-conversation

  7. TechCrunch: Discord adds Reddit-like Forum channels (Sept 2022). https://techcrunch.com/2022/09/14/discord-forum-channels/

  8. Stage Channels FAQ. https://support.discord.com/hc/en-us/articles/1500005513722-Stage-Channels-FAQ ; Running and Moderating Discord Stages best practices. https://discord.com/blog/running-moderating-discord-stages-best-practices

  9. Discord Game Overlay 101. https://support.discord.com/hc/en-us/articles/217659737-Game-Overlay-101 ; Mobile Voice Overlay (Android). https://support.discord.com/hc/en-us/articles/360042693171-Mobile-Voice-Overlay-Android

  10. Activities Overview. https://discord.com/developers/docs/activities/overview ; Discord Embedded App SDK (GitHub). https://github.com/discord/embedded-app-sdk ; Colyseus Discord Embedded SDK blog. https://colyseus.io/blog/discord-embedded-sdk/

  11. Listening Along with Spotify. https://support.discord.com/hc/en-us/articles/115003966072-Listening-Along-with-Spotify

  12. Midjourney is Leaving Discord (Medium). https://medium.com/@nicktheaiguru/midjourney-is-leaving-discord-9b2072c7a902

  13. Decrypt: Clyde's Last Call (Dec 2023). https://decrypt.co/206528/clydes-last-call-discords-ai-chatbot-being-shut-down-on-december-1 ; Clyde chatbot (Discord Wiki). https://discord.fandom.com/wiki/Clyde_(chatbot)

  14. Kubernetes Controllers (official). https://kubernetes.io/docs/concepts/architecture/controller/

  15. OneUptime: ArgoCD Sync Status vs Health Status. https://oneuptime.com/blog/post/2026-02-26-argocd-sync-status-vs-health-status/view ; ArgoCD Resource Health docs. https://argo-cd.readthedocs.io/en/stable/operator-manual/health/ 2

  16. Terraform plan command reference. https://developer.hashicorp.com/terraform/cli/commands/plan ; Terraform Plan Made Simple. https://controlmonkey.io/resource/terraform-plan-made-simple/ 2

  17. CQRS Pattern (Azure). https://learn.microsoft.com/en-us/azure/architecture/patterns/cqrs 2

  18. React DevTools. https://react.dev/learn/react-developer-tools ; Flutter Inspector. https://docs.flutter.dev/tools/devtools/inspector

  19. Ferd, The Zen of Erlang. https://ferd.ca/the-zen-of-erlang.html ; OTP Supervision Tree Patterns. https://medium.com/@kanishks772/the-supervision-tree-patterns-that-make-systems-bulletproof-356199f178bb 2

  20. Observer docs (erlang.org). https://www.erlang.org/doc/apps/observer/observer_ug.html ; observer_cli (GitHub). https://github.com/zhongwencool/observer_cli

  21. Loom: interface to the multiverse (Janus, generative.ink). https://generative.ink/posts/loom-interface-to-the-multiverse/

  22. socketteer/loom (GitHub). https://github.com/socketteer/loom ; cyborgism wiki: Loom. https://cyborgism.wiki/hypha/loom 2

  23. LibreChat fork feature docs. https://www.librechat.ai/docs/features/fork 2

  24. Claude Code issue #32631: Conversation Branching full spec. https://github.com/anthropics/claude-code/issues/32631 2 3

  25. Forky: git-style LLM history with semantic three-way merge. https://ishan.rs/posts/forky-git-style-llm-history 2

  26. Cursor checkpoints docs. https://cursor.com/docs/agent/chat/checkpoints ; Cursor restore-UI confusion forum thread. https://forum.cursor.com/t/ux-ui-confusion-on-restoring-checkpoints/67614 2

  27. Marimo dataflow docs. https://docs.marimo.io/guides/editor_features/dataflow/ ; Observable reactive dataflow. https://observablehq.com/@observablehq/reactive-dataflow

  28. Hex 2.0 Reactivity, Graphs, and a little bit of Magic. https://hex.tech/blog/hex-two-point-oh/ ; Hex 3.0 parallel execution changelog. https://learn.hex.tech/changelog/2023-10-05

  29. Figma: Multiplayer Editing in Figma. https://www.figma.com/blog/multiplayer-editing-in-figma/ ; Building Figma Multiplayer Cursors (Mark Skelton). https://mskelton.dev/blog/building-figma-multiplayer-cursors

  30. Liveblocks Presence guide. https://liveblocks.io/docs/guides/how-to-use-liveblocks-presence-with-react

  31. PartyKit Facepile review (Matt Webb). https://interconnected.org/more/2023/partykit/facepiles.html

  32. Figma cursor chat. https://help.figma.com/hc/en-us/articles/4403130802199-Use-cursor-chat-in-Figma-Design

  33. W3C Design Tokens Format Module 2025.10. https://www.designtokens.org/tr/drafts/format/ ; W3C Design Tokens Community Group. https://www.designtokens.org

  34. Vercel json-render (InfoQ). https://www.infoq.com/news/2026/03/vercel-json-render/ ; Vercel AI SDK 3 Generative UI. https://vercel.com/blog/ai-sdk-3-generative-ui

  35. shadcn/ui. https://ui.shadcn.com

  36. Radix Primitives. https://www.radix-ui.com/primitives

  37. NN/G: Spatial Memory: Why It Matters for UX Design. https://www.nngroup.com/articles/spatial-memory/ ; Scarr, Cockburn, Bateman: Understanding and Exploiting Spatial Memory (Canterbury). https://ir.canterbury.ac.nz/handle/10092/9326 2

  38. Art of Styleframe: Dashboard Design Patterns for Modern Web Apps 2026. https://artofstyleframe.com/blog/dashboard-design-patterns-web-apps/

  39. Datadog: Selecting the right colors for your graphs. https://docs.datadoghq.com/dashboards/guide/widget_colors/ ; Datadog: Understanding Duplicate Colors in the Consistent Palette. https://docs.datadoghq.com/dashboards/guide/consistent_color_palette/

  40. Wathan, A. & Schoger, S. Refactoring UI. https://refactoringui.com

  41. Material Design 3: Motion overview. https://m3.material.io/styles/motion/overview/how-it-works

  42. NN/G: Skeleton Screens 101. https://www.nngroup.com/articles/skeleton-screens/

  43. Storybook MCP server. https://storybook.js.org/docs/ai/mcp/overview ; Storybook Manifests. https://storybook.js.org/docs/ai/manifests ; Storybook MCP for React announcement. https://storybook.js.org/blog/storybook-mcp-for-react/

  44. Karri Saarinen: Why is quality so rare? (Linear, Config 2025). https://linear.app/now/why-is-quality-so-rare ; Design Is Search (AI Creators Media). https://en.ai-creators.tech/media/creative/design-search/

  45. Design Engineering at Vercel. https://vercel.com/blog/design-engineering-at-vercel

  46. Rauno Freiberg: Web Interface Guidelines. https://interfaces.rauno.me ; Craft / Novelty. https://rauno.me/craft/novelty ; Devouring Details. https://devouringdetails.com

  47. Emil Kowalski. https://emilkowal.ski ; sonner. https://sonner.emilkowal.ski ; vaul. https://vaul.emilkowal.ski ; Animations on the Web. https://animations.dev

  48. Josh Comeau: Springs and Bounces in Native CSS. https://www.joshwcomeau.com/animation/linear-timing-function/ ; An Interactive Guide to CSS Transitions. https://www.joshwcomeau.com/animation/css-transitions/

  49. Bret Victor. https://worrydream.com ; Inventing on Principle transcript. https://jamesclear.com/great-speeches/inventing-on-principle-by-bret-victor 2

  50. Temporal Web UI docs. https://docs.temporal.io/web-ui ; Temporal Events reference. https://docs.temporal.io/references/events ; Temporal Updated Event History Timeline View. https://temporal.io/change-log/updated-event-history-timeline-view-is-now-available 2

  51. Airflow UI Overview. https://airflow.apache.org/docs/apache-airflow/stable/ui.html ; Airflow utils/state source. https://airflow.apache.org/docs/apache-airflow/1.10.3/_modules/airflow/utils/state.html 2 3

  52. Dagster Asset and Run Visualization (DeepWiki). https://deepwiki.com/dagster-io/dagster/7.4-run-and-event-interfaces ; Dagster Column-level lineage. https://docs.dagster.io/guides/build/assets/metadata-and-tags/column-level-lineage

  53. Prefect v3 States concepts. https://docs.prefect.io/v3/concepts/states 2

  54. Kestra Platform overview. https://kestra.io/overview ; Kestra (DeepWiki). https://deepwiki.com/kestra-io/kestra

  55. Argo Workflows Suspending. https://argo-workflows.readthedocs.io/en/latest/walk-through/suspending/ ; Dynamic Fan-out/Fan-in in Argo Workflows. https://medium.com/@corvin/dynamic-fan-out-and-fan-in-in-argo-workflows-d731e144e2fd

  56. Cursor changelog 3.0. https://cursor.com/changelog/3-0 ; Cursor 3 agent-first interface (The Decoder). https://the-decoder.com/new-cursor-3-ditches-the-classic-ide-layout-for-an-agent-first-interface-built-around-parallel-ai-fleets/ 2 3

  57. Replit Agent 4 landing. https://replit.com/agent4 ; Introducing Agent 4 (Replit blog). https://blog.replit.com/introducing-agent-4-built-for-creativity 2

  58. Cognition Devin 2.0 blog. https://cognition.ai/blog/devin-2 ; Devin release notes. https://docs.devin.ai/release-notes/overview ; Devin can now Schedule Devins. https://cognition.ai/blog/devin-can-now-schedule-devins 2 3

  59. LangSmith Observability. https://www.langchain.com/langsmith/observability ; Langfuse Observability overview. https://langfuse.com/docs/observability/overview ; Langfuse Sessions. https://langfuse.com/docs/observability/features/sessions

  60. LangGraph time-travel docs. https://docs.langchain.com/oss/python/langgraph/use-time-travel ; LangGraph Studio blog. https://blog.langchain.com/langgraph-studio-the-first-agent-ide/ 2

  61. Endsley, M.R. (1995). Toward a theory of situation awareness in dynamic systems. Human Factors, 37(1), 32-64. ; Endsley, M.R. & Kiris, E.O. (1995). The out-of-the-loop performance problem and level of control in automation. Human Factors, 37(2), 381-394. 2 3

  62. Wickens, C.D. (2018). Automation stages & levels, 20 years after. Journal of Cognitive Engineering and Decision Making, 12(1), 35-41.

  63. Automerge (GitHub). https://github.com/automerge/automerge ; Automerge CRDTs concept docs. https://www.mintlify.com/automerge/automerge/concepts/crdts 2

  64. Yjs (GitHub). https://github.com/yjs/yjs 2

  65. Liveblocks Yjs sync engine. https://liveblocks.io/docs/ready-made-features/multiplayer/sync-engine/liveblocks-yjs

  66. Jazz.tools. https://jazz.tools/ ; Jazz CoValues concepts. https://jazz.tools/docs/react-native/core-concepts/covalues/overview

  67. Ink & Switch Dispatch 004: Peritext. https://www.inkandswitch.com/newsletter/dispatch-004/ ; Ink & Switch Peritext. https://www.inkandswitch.com/peritext/

  68. Towards universal version control with Patchwork (Geoffrey Litt). https://buttondown.com/geoffreylitt/archive/towards-universal-version-control-with-patchwork/

  69. Ferd, The Zen of Erlang. https://ferd.ca/the-zen-of-erlang.html

  70. Understand time-travel debugging in Redux. https://app.studyraid.com/en/read/12414/400817/time-travel-debugging-in-redux ; Redux DevTools tips and tricks (LogRocket). https://blog.logrocket.com/redux-devtools-tips-tricks-for-faster-debugging/

  71. Replay.io: The MCP time travel debugger. https://www.replay.io/ ; Introduction to time travel debugging (Replay.io blog). https://blog.replay.io/introduction-to-time-travel-debugging

  72. IBM: What Is a Digital Twin? https://www.ibm.com/think/topics/digital-twin ; Volvo: Digital twins -- the ultimate virtual proving ground. https://www.volvoautonomoussolutions.com/en-en/news-and-insights/insights/articles/2025/jun/digital-twins--the-ultimate-virtual-proving-ground.html

  73. Sheridan, T.B. & Verplank, W.L. (1978). Human and Computer Control of Undersea Teleoperators. MIT Man-Machine Systems Laboratory. ; Supervisory control (Wikipedia). https://en.wikipedia.org/wiki/Supervisory_control 2

  74. Lee, J.D. & See, K.A. (2004). Trust in automation: Designing for appropriate reliance. Human Factors, 46(1), 50-80. https://pubmed.ncbi.nlm.nih.gov/15151155/ 2 3

  75. National Academies: Situation Awareness in Human-AI Teams. https://www.nationalacademies.org/read/26355/chapter/6 ; Team SA (Salmon et al., ResearchGate). https://www.researchgate.net/figure/Team-situation-awareness-Salmon-et-al-2008-adapted-from-Endsley-1995b_fig1_287390257

  76. k9s Hotkeys docs. https://k9scli.io/topics/hotkeys/ ; The Complete K9s Cheatsheet. https://ahmedjama.com/blog/2025/09/the-complete-k9s-cheatsheet/

  77. Level Triggering and Reconciliation in Kubernetes (HackerNoon). https://hackernoon.com/level-triggering-and-reconciliation-in-kubernetes-1f17fe30333d ; Reconciliation Loop (kubebuilder). https://deepwiki.com/kubernetes-sigs/kubebuilder/5.2-reconciliation-loop

  78. Live projections for read models (Kurrent). https://www.kurrent.io/blog/live-projections-for-read-models-with-event-sourcing-and-cqrs ; Marten CQRS/ES (Code Magazine). https://www.codemag.com/Article/2209071/Event-Sourcing-and-CQRS-with-Marten

  79. Obsidian Graph View Help. https://help.obsidian.md/plugins/graph

  80. Speculative analysis synthesized from: Screeps (https://store.steampowered.com/app/464350/Screeps_World/), Dynamicland (https://dynamicland.org/), Ink & Switch Malleable Software (https://www.inkandswitch.com/essay/malleable-software/), Spatial Interfaces (Pasquale D'Silva, https://medium.com/elepath-exports/spatial-interfaces-886bccc5d1e9), and Bret Victor's Seeing Spaces (http://worrydream.com/SeeingSpaces/). 2

  81. Screeps: World. https://store.steampowered.com/app/464350/Screeps_World/ ; bencbartlett/Overmind AI for Screeps. https://github.com/bencbartlett/Overmind

  82. Dynamicland. https://dynamicland.org/ ; Dynamicland FAQ 2024. https://dynamicland.org/2024/FAQ/ ; Dynamicland: Computational Public Space. https://dynamicland.org/2024/Computational_Public_Space/ 2

  83. cmdk (Paco Coursey). https://cmdk.paco.me

  84. Ink & Switch Dispatch 001: On Embark and Lude. https://newsletter.inkandswitch.com/archive/dispatch-001-on-embark-and-lude/ ; Geoffrey Litt: Malleable software in the age of LLMs. https://www.geoffreylitt.com/2023/03/25/llm-end-user-programming.html

  85. Jason Yuan Design. https://jasonyuan.design/ ; Jason Yuan announcing Dot on LinkedIn. https://www.linkedin.com/posts/jasonyuandesign_announcing-our-first-product-dot-an-intelligent-activity-7125515579519631361--ofh

  86. Peep: The Network Auralizer -- USENIX LISA 2000. https://www.usenix.org/legacyurl/peep-network-auralizer-monitoring-your-network-sound

  87. Geoffrey Litt: Dynamic documents // LLMs + end-user programming. https://www.geoffreylitt.com/2022/11/23/dynamic-documents ; Codifying a ChatGPT workflow into a malleable GUI. https://www.geoffreylitt.com/2023/07/25/building-personal-tools-on-the-fly-with-llms

  88. Maggie Appleton: Home-Cooked Software and Barefoot Developers. https://maggieappleton.com/home-cooked-software

  89. ContextBranch: A Version Control Approach to Exploratory Programming (arXiv). https://arxiv.org/abs/2512.13914

  90. Terraform drift explained. https://encore.cloud/resources/terraform-drift ; Flux Kustomization docs. https://fluxcd.io/flux/components/kustomize/kustomizations/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment