Parallelism-Preserving UI for Agentic Work — mixed-audience research survey

title	Parallelism-Preserving UI for Agentic Work
date	2026-04-11
type	research
status	draft
variant	mixed-audience

What This Document Is

A research survey for a mixed team -- five engineers, four designers -- exploring what it looks like to supervise autonomous agents doing product development work. The premise is settled ground from prior research ¹²: chat interfaces flatten parallel, branching, long-running agent work into serial transcripts, and the result is that humans lose the ability to supervise effectively. The question here is not whether chat fails but what replaces it, and specifically what replaces it for a team where half the people think in spatial compositions and half think in data structures.

The core finding across roughly 160 surveyed sources is convergence. Five mature infrastructure domains -- Kubernetes controllers, workflow orchestrators, version control, reactive notebooks, Erlang/OTP supervision trees -- independently arrived at the same architectural shape for "supervise a declarative, long-running, parallel system." The vocabulary differs. The shape does not. And no shipping agent UI has adopted it yet.

The second finding is that the most underexplored territory is not visualization technique. The visualization patterns are well-settled. The gaps are in three specific contracts: level-triggered reconciliation from Kubernetes, supervision-tree semantics from Erlang/OTP, and CRDTs as the substrate for treating agents as multiplayer collaborators. Each is backed by decades of production infrastructure. Each has near-zero adoption in agent UX.

What Does Our Own Product Already Have?

Discord has been running the vocabulary for agent supervision in production for five years, and nobody in the agent-UX space has noticed ³.

Start with the thing a designer touches every day: Rich Presence. That status card on a Discord profile -- the one showing what game someone is playing, what song they are listening to, what IDE they have open -- is a state-projection UI ⁴. Every field is a snapshot of process state; nothing streams. The details field (128 characters) maps to "what the agent is doing." The state field maps to "where in the state machine." The party.size tuple -- a literal [current, max] -- renders "task 3/7" natively in the Discord client with zero custom UI. The timestamps fields cause Discord to auto-render an elapsed or remaining counter. The small-image overlay can encode health status using color. An engineer's Discord profile becomes a live operator dashboard, and the build is two days: fork any VSCode discord-rpc extension ⁵, replace "current file" with the current task.

Forum channels are the most parallelism-preserving container Discord ships, and the team in the room built them ⁶. Discord added forums in September 2022 after admitting that text channels flatten parallel conversations ⁷. Each post is a first-class thread with its own title and top message. Available tags -- the available_tags field on the channel, applied_tags on each post -- are a fixed, channel-scoped taxonomy. This is the same pattern as design tokens constraining agent output: a closed set of labels, enforced at the platform level. Tags map to the task-status enum. Thread archival via auto_archive_duration is the "archive, don't delete" pattern. Per-post subscription is per-task notification control. A forum channel as a task board is a weekend build on existing infrastructure.

Stage channels are the cleanest human-in-the-loop gate Discord ships ⁸. Three roles -- moderator, speaker, audience -- map onto the supervision hierarchy. The raise-hand queue is a shipped approval-gate UI. Voice channel tiles are already a multi-agent parallelism view: one tile per participant with fixed positions (no reflow), a voice-activity-detected ring that glows during transmission, and mute/deafen/camera icons as status glyphs ⁹. The design insight here is spatial stability -- tiles do not rearrange when someone joins or leaves. That positional invariance is load-bearing for supervision at scale, and Discord already enforces it.

The Embedded App SDK ¹⁰ enables shipping a web application inside an iframe inside a Discord voice channel, with shared state across all participants. The Listening-Along pattern from Spotify integration ¹¹ -- a button on someone's profile that synchronizes your client to their playback position -- is the template for co-supervision: click, and your dashboard opens to your teammate's current run, scoped to their current task.

The prior-art scorecard across nine surveyed "agent in Discord" projects (MidJourney, various bridge bots, llmcord) tells the story of missed opportunity ³. Eight Discord primitives times nine projects yields 72 possible cells. Exactly 11 are populated, all in the same column (edit-in-place embed). No project uses Rich Presence. No project uses forum channels as the task substrate. No project uses Stage channels. No project uses voice tiles as a swim-lane. MidJourney's departure from Discord in August 2024 ¹² is evidence: they left precisely because nobody built the state-projection surface that Discord's own primitives would support. That exit is the hole these primitives are shaped to fill.

Clyde's auto-thread pattern, shipped between March and December 2023, is worth noting because Clyde's own team recognized in 2023 that flattening AI replies into the main channel stream is the wrong default ¹³. The anti-flattening impulse is already in the product's history.

What Vocabulary Do Other Fields Use for This?

The question "how do you build a UI for supervising autonomous processes" has been answered independently by at least five infrastructure domains. The surprising finding is not that answers exist but that they are the same answer wearing different clothes.

Kubernetes calls it the spec/status split ¹⁴: you declare what you want (spec), the system continuously reconciles toward it, and the UI shows you the gap between desired and observed. ArgoCD adds a second axis, separating sync status ("does observed match desired?") from health status ("is the thing actually working?") ¹⁵. These axes are independent -- an application can be synced but degraded, or out-of-sync but healthy. Collapsing them into a single status badge, as most agent dashboards do, hides half the information.

Terraform formalized the vocabulary of change as a first-class object ¹⁶. A plan is a reviewable change set, proposed but not executed. It is a data structure, not a side effect. Apply reifies a plan. Drift is the gap between declared state and actual state when something mutates out of band. The diff symbols -- + create, - destroy, ~ update in-place, -/+ replace -- are already a mini design system every engineer recognizes by muscle memory.

CQRS and event sourcing separate the write model from arbitrarily many read models ¹⁷. The write side records events. The read side is a materialized view optimized for a specific query. The payoff: one event log supports many simultaneous projections -- a task list, a timeline, a dependency graph, a swim-lane diagram. Each projection is a pure function of the log. They are all the same truth, all recomputable at any time.

Component-tree inspectors from React DevTools to Flutter Inspector ¹⁸ treat the running application as a navigable, searchable, hover-to-highlight tree. They are pure read-only projections over live state.

The Erlang/OTP supervision tree ¹⁹ is the most battle-tested model in the industry for a hierarchical set of long-lived, fallible, restartable workers. A supervisor does nothing but start, monitor, and restart its children; a worker does work. Restart strategies are named policies picked from a menu -- one_for_one, one_for_all, rest_for_one -- rather than reimplemented for each failure scenario. Max restart intensity is the circuit-breaker primitive: "if a child crashes more than N times in T seconds, escalate." The observer GUI, shipped for 15 years, renders the live supervision tree as an interactive graph ²⁰.

What matters for the mixed team is not the implementation details of any one system but the convergence: every domain arrived at append-only event logs as the source of truth, pure read-only projections as the UI, a strict refusal to let the UI become a write-through store, and multiple simultaneous lenses over the same underlying state.

The design consequence is that picking one visualization for an agent dashboard is the wrong move. The correct move is picking two or three projections over the same state, because that is what every mature domain discovered independently.

What Does Branching Actually Look Like as an Interface?

Open Figma and create a component variant. Branch your file. Add a variant in the branch. Now think about what happened structurally: you forked a point in time, diverged, and eventually you will merge or discard. This is the same structure underneath every version-control system, every branching conversation UI, and every multi-response agent interaction. The question is how to make that tree visible and navigable rather than hidden behind arrow-nudge buttons.

The mature vocabulary comes from four lineages ²¹: git DAG renderers (forensic branching -- what happened?), reactive dataflow notebooks (dependency branching -- what depends on what?), the Loom family of multiverse writing interfaces (exploratory branching -- what could happen?), and design-tool branching (creative branching -- what looks better?). An agent operator needs all four at once.

Loom is the only interface that has ever taken LLM branching fully seriously ²². Built in 2020 because AI Dungeon's linear interface could not express navigating a probabilistic multiverse of continuations, Loom introduced vocabulary everyone else later reinvents: multiverse (the full tree), world tree (its visual rendering), hoist/unhoist (focus on a subtree, collapse everything outside it), ground truth trajectory (the chosen path rendered in black, unchosen siblings in grey). Its two-axis keyboard model -- sequential navigation (next/previous node) orthogonal to structural navigation (parent/child/sibling) -- is a move every tree UI should steal. The current message exists in two spaces simultaneously: the reading-order timeline and the parent-child tree.

LibreChat ships the most production-ready forking UI in any open-source LLM client ²³. Its contribution is naming three discrete fork modes: Direct Path Only (just the linear visible messages), Include Related Branches (the path plus branches attached along the way), and Include All to/From Here (every message, visible or not). This names three distinct user intents under "fork" and lets the user choose without inventing new words.

The Claude Code community spec (issue #32631) ²⁴ is the single most useful artifact found in the branching survey. It defines fork, branch, sibling, checkpoint, rewind, switch, and merge as distinct named operations with distinct semantics. The most novel idea: merge is explicitly not replaying tool calls. It is a compacted context summary injected as a system message. Structural three-way merge of prose does not work because two branches of a conversation are not textually recombinable ²⁵. This is honest about a hard constraint that every other tool ignores.

Cursor's checkpoints ²⁶ are instructive primarily as a failure case. Cursor zips the pre-change state into a local checkpoint whenever the agent modifies code, and "Restore Checkpoint" reverts to that snapshot. But it does not branch on restore -- it is destructive. Forum threads are full of users who lost work because restore-is-not-branch is exactly the failure mode that breaks first when "fork from past turn" is not first-class.

Reactive notebooks -- Marimo, Observable, Hex ²⁷²⁸ -- contribute the sharpest insight linking branching to parallelism. Marimo models a notebook as a directed acyclic graph where edges represent data dependencies. Hex 3.0 leverages the dependency graph to support parallel cell execution. The insight: once you have the DAG, parallel execution falls out of topological sort. Branching and parallelism are dual -- a branch you have not merged yet is a parallel subgraph. Marimo's "stale" primitive, where dependent cells are marked stale rather than auto-rerun, maps directly to agent QA: a QA report is stale when its target task has been re-run since the report was written.

Eight failure modes emerge when branching becomes first-class ²⁴²²²³²⁶: the active-branch pointer must persist across refreshes; rewind and fork must be distinct operations with separate confirmation flows; search breaks when conversation structure is a DAG instead of a list; merge is fundamentally lossy for prose; garbage collection of abandoned branches is a political question; "the most recent message" stops being meaningful when latest is scoped to a branch; keyboard navigation needs two axes; branch names are the new file names.

What Would a Design-Systems Team Specifically Bring to This?

A backend engineer's take on agent supervision ships a read model. A design-systems engineer's take ships a vocabulary. The difference is the difference between "here are the facts about your agent" and "here is a component contract the agent cannot violate, a layout that survives non-determinism, and a presence grammar that tells you where the agent is looking."

Presence primitives are the most direct and most underused borrow in agent UX. Figma's multiplayer cursors ²⁹, Liveblocks' useMyPresence/useOthers hooks ³⁰, Linear's colored avatar rings, Notion's ghosted-avatar facepile for "recently present" ³¹ -- all implement the same grammar: identity (avatar plus color plus name), location (current file or task or node), and activity (short verb: reading, thinking, writing, blocked). The agent is just another peer in the multiplayer room. Figma's cursor chat -- an empty speech bubble that appears next to your cursor, the message disappears after five seconds ³² -- is the right model for ephemeral agent narration. "The agent is thinking about task 4, says 'checking if the schema matches.'" No scrollback, no log panel; the narration lives at the cursor and evaporates. The agent becomes a collaborator with a presence, not a process in a monitoring panel.

Design tokens become the agent's legal output alphabet. The W3C Design Tokens Format Module reached its first stable version in October 2025 ³³, establishing a vendor-neutral JSON format with validated type constraints. Vercel's json-render framework ³⁴ implements the exact move: developers define a catalog of permitted components and actions using Zod schemas, an LLM generates a JSON specification constrained to that catalog, and the framework renders the output progressively. This is the reference implementation for "design system as agent constraint." The shadcn/ui CSS-variable bus ³⁵ -- where every component draws from the same CSS variables like --primary, --background, --foreground -- is the coordination layer. Radix Colors' twelve-step perceptually-uniform scales ³⁶ are the correct palette substrate for status encoding: steps 9-10 for active agents, 3-4 for idle, 11-12 for text.

Spatial memory is the primary UX lever for long-running multi-agent supervision, and no shipping agent UI uses it. Nielsen-Norman Group research demonstrates that the ability to recall the location of controls is essential for power users ³⁷. This is why Linear is fast, why Raycast is fast, why k9s is fast: the layout does not move. The design-systems rule is that the chrome must be boring on purpose so the non-determinism has somewhere to land. The invariant-layout pattern -- sidebar 240-280px, top bar, content grid, right rail, nothing moves ever ³⁸ -- is a copyable rule set. Dashboard reflow (when the dashboard rearranges its own layout in response to state changes) is the named anti-pattern.

What backend dashboards get wrong, per the surveyed design-engineering literature: color without a ramp (Grafana and Datadog ship raw hex ³⁹), typography as decoration instead of hierarchy (Refactoring UI's canonical rebuttal: "relying too much on font size; try color or font weight instead" ⁴⁰), motion without meaning (Material Design 3 names status as a first-class use case for motion ⁴¹), and confusing skeletons with spinners ⁴².

A new primitive proposed by this research: the reconciliation heartbeat -- a visible pulse indicating "the loop is alive, the world was observed at T minus 3 seconds, the diff is empty, all is well." Green heartbeat is a whole category of "absence of change is the signal" that Kubernetes has shipped the UX for but agent UIs have not. Discord's voice-activity-detected ring -- the glow around a user's avatar when they transmit -- is already this primitive, shipped as an avatar decoration.

Storybook's MCP server ⁴³ and Component Manifests -- JSON objects describing the contents of a Storybook in a concise, structured way for AI agents -- are the existing machine-readable agent-task-catalog format. A task template is a story (with args, a play function, decorators). A running task is a story in the Canvas tab. The task catalog is the Storybook sidebar. Agent skill libraries compose via Storybook's composition mechanism. The design-systems engineer is the only cohort for whom this vocabulary is already machine-readable for LLMs in production.

The voices that define the intersection -- Karri Saarinen at Linear ⁴⁴, Guillermo Rauch at Vercel ⁴⁵, Rauno Freiberg ⁴⁶, Emil Kowalski ⁴⁷, Josh Comeau ⁴⁸, Bret Victor ⁴⁹ -- converge on one principle: the feedback loop between decision and consequence must be instantaneous and continuous. Chat interfaces break this loop. Projection interfaces restore it.

How Do Other Tools Visualize Parallel Work?

Every shipping workflow and agent orchestration tool has converged on a small set of visual shapes. Surveying LangGraph Studio, Temporal, Airflow, Dagster, Prefect, Kestra, Argo Workflows, Buildkite, GitHub Actions, Cursor 3, Devin, Replit Agent 4, CrewAI, AutoGen Studio, Inngest, LangSmith, and Langfuse reveals eight recurring shapes ⁵⁰⁵¹⁵²⁵³⁵⁴⁵⁵⁵⁶⁵⁷⁵⁸.

The DAG graph is the most universal: nodes plus edges, live-updated, nodes colored by state. Every orchestrator ships this. Airflow's Grid view ⁵¹ is the canonical at-a-glance health surface: a matrix where columns are runs, rows are tasks, and cells are colored by state. The Gantt chart is the temporal lens -- horizontal bars per task against a real time axis, shipped by Kestra, Prefect, and Temporal's Timeline view. The kanban board, shipped by Replit Agent 4 ⁵⁷ with columns Drafts/Active/Ready/Done, maps the state machine onto spatial columns. The trace tree from LangSmith and Langfuse ⁵⁹ nests root spans with children, indented or tree-shaped. The swim lane from BPMN separates horizontal lanes per agent or owner. The tabs grid from Cursor 3's Agents Window ⁵⁶ arranges up to eight parallel agent sessions as tiles -- side-by-side, grid, or stacked. The sidebar tree from Devin ⁵⁸ nests parent and child sessions in a left navigation pane.

Each shape is a different projection of the same underlying state. The actionable guidance is to pick two or three lenses for a demo, not one.

Temporal's vocabulary is the most precise in the space ⁵⁰. Workflow, Run, Event History, Activity, Pending Activity, Signal, Reset, Continue-as-new, Child Workflow. Its August 2024 redesign split Event History into two views: Compact (no clock time, just ordering) and Timeline (horizontal time axis, real duration). Same state, two lenses.

Airflow's task-instance state palette ⁵¹ has been stable for approximately six years and has been copied by every orchestrator since: queued (gray), running (lime), success (green), failed (red), up_for_retry (gold), up_for_reschedule (turquoise), upstream_failed (orange), skipped (pink), deferred. Nine core colors, all pastel-saturated. The upstream_failed state -- when a parent fails, children propagate an explicit orange distinct from their own red -- is circuit-breaking as a color. Prefect adds the critical distinction between failed (code-level), crashed (infrastructure killed the run), paused (waiting for human approval), and cancelled (user stopped it) ⁵³. Code failure and infrastructure failure must be different words.

LangGraph Studio ⁶⁰ is the only production agent IDE that makes fork-from-past-step first-class. It persists a checkpoint at every node. Rewinding creates a new checkpoint that branches; the original execution history stays intact. This is the orchestration-flavored analog of the Claude Code fork/branch/checkpoint vocabulary from the branching survey.

Cursor 3's Agents Window ⁵⁶, shipped April 2026, is the parallelism-preserving IDE that already exists. Agent tabs are arrangeable side-by-side, in a grid, or stacked. Up to eight agents run in parallel across isolated Git worktrees. The Best-of-N pattern -- select multiple models, each produces a solution in an isolated worktree, results appear side-by-side -- is the "multiple responses from a single prompt" feature, already shipped in a mass-market tool.

Devin's plan-review gate ⁵⁸ is the cleanest human-in-the-loop primitive. Each session starts with Devin surfacing relevant files and a preliminary plan. The user edits the plan before autonomous work begins. This is Endsley's SA Level 3 -- projection of future state ⁶¹ -- applied to agent plans, and it is the simplest way to prevent the out-of-the-loop problem that degrades human intervention quality during long autonomous runs ⁶².

The gap nobody has claimed: no surveyed tool combines the Grid view (state over history) with the swim-lane view (cross-agent) simultaneously. No surveyed tool exposes rewind-plus-fork as a button on any cell in the Grid view. These are unclaimed intersections in territory where every individual shape is well-settled.

What Concepts Have Not Been Borrowed Yet?

The agent-UX community is reinventing vocabulary that control theory, distributed systems, local-first research, and human factors have been polishing for decades. The prior sections mined the surface. The deeper layer is that these are not just borrowable names but borrowable design contracts -- each carrying a theorem about what you get for free once you adopt the shape.

CRDTs are the biggest unclaimed borrow from a design perspective ⁶³⁶⁴⁶⁵. Operational Transformation and CRDTs solved a problem agent UX is stumbling into without realizing it: how do multiple actors edit the same document concurrently, offline-tolerant, without a central lock, and converge to the same state? Replace "multiple users" with "multiple agents plus humans plus the filesystem watcher" and the problem is identical. Jazz's CoValues ⁶⁶, Yjs shared types ⁶⁴, and Automerge documents ⁶³ are data structures whose mutations automatically propagate and merge, with eventual consistency by construction, full edit history signed per author, and offline-first sync.

The hidden payoff: CRDTs are naturally branching data structures. Every branching concern from the earlier section -- fork from turn N, sibling conversations, non-destructive rewind -- is trivially expressible as CRDT forks. Ink & Switch's Peritext ⁶⁷, a CRDT for rich text with inline marks now implemented in Automerge, enables multiple agents to concurrently annotate the same prose without clobbering each other's marks. Nobody has shipped multi-agent collaborative writing on Peritext. Patchwork ⁶⁸, also from Ink & Switch, pursues universal version control as a kernel service -- branches, edits, merges, comments, review -- applied across arbitrary apps. The agent analog: every agent-produced artifact is automatically versioned, branchable, mergeable, and commentable without the agent knowing.

OTP supervision trees are the biggest unclaimed borrow from an engineering perspective ¹⁹. Erlang's split between supervisors (who only start, monitor, and restart children) and workers (who do work) is a moral commitment. The restart strategy menu -- one_for_one, one_for_all, rest_for_one -- names failure-recovery policies rather than reimplementing them per scenario. Max restart intensity is the circuit breaker. "Isolation is the foundation. Supervision is the workflow" ⁶⁹. Nobody in agent UX draws the multi-agent diagram as a supervision tree, despite it being the actual shape of a long-running agent orchestrator.

Time-travel debugging has been a stable primitive set since Redux DevTools ⁷⁰: an action log, a timeline scrubber, pin-to-substate, export/import state. LangGraph Studio ships time travel for agent orchestration ⁶⁰, but no consumer agent UI does. Replay.io's product -- capture a session, send a URL, the recipient scrubs the event log ⁷¹ -- is the shape an agent debugging surface should take. Capture an agent run as a bundle of state files and cached LLM responses, send a URL to a teammate, they scrub the trajectory to find where it went wrong. This is Redux DevTools applied to agent runs. It is viscerally better than reading terminal logs.

The digital twin metaphor from IoT and industrial engineering ⁷² transfers cleanly. A digital twin is a virtual replica continuously updated with live sensor data, used to monitor, simulate, and predict. Applied to agents: the UI maintains a live, queryable, simulatable model of the agent's world in parallel with the agent itself. The user can scrub it (time travel), simulate it ("what if the agent picks option B" without affecting the real run), diff it versus the real system (drift detection), and project forward ("if this trajectory continues, task 5 will be blocked in three minutes"). That last capability -- forward projection -- is Endsley SA Level 3 ⁶¹ made concrete, and nobody in agent UX has shipped it.

HCI supervision frameworks ⁶¹⁷³⁷⁴ extend to multi-agent work with unfinished edges. Team situation awareness from Endsley and Jones ⁷⁵ decomposes SA into shared SA (what everyone needs) and individual SA (what each role needs), implying that different viewers should get different projections of the same event log. Sheridan's ten-level automation scale ⁷³ applied per-task rather than per-run produces a visible badge: "this task is L8, you only hear about it at DONE; this task is L4, you approve before execution." Lee and See's trust calibration ⁷⁴ applied per-agent produces a reliability-history widget: "agent A succeeded on 14 of 15 refactoring tasks, failed on 3 of 7 test-writing tasks." No agent UI has shipped per-agent reliability history as a first-class trust primitive.

The State-Projection Contract (The Engineering Foundation)

For the engineers in the room who want the architectural skeleton underneath the design vocabulary: here is the contract that makes everything above possible.

k9s ⁷⁶ is the sharpest reference for what an agent supervision UI should do. It is a stateless, queryable, keyboard-driven projection over a declarative system that refuses to be a database. Command mode (: to switch resource views), search mode (/ scoped to current view), xray view (:xray for a hierarchy tree), hotkey system (hotkeys.yaml for saved queries), help overlay (? for context-aware keybindings). The anti-pattern k9s deliberately avoids: it does not own state. Even edit and delete delegate to the API server. The UI is pure projection.

The Kubernetes controller contract matters because it carries a philosophical commitment most agent frameworks silently violate: the distinction between level-triggered and edge-triggered reconciliation ⁷⁷. An edge-triggered system reacts to changes. A level-triggered system reacts to the current state of the world regardless of how it arrived there. Kubernetes is level-triggered by constitutional law. Every Slack bot and Discord bot is edge-triggered by default. The consequence: running the same reconciliation loop twice with the same input must produce the same result with no additional side effects. If the filesystem already shows "done," the reconciler must detect that and short-circuit. Every controller tutorial says this; every "agent loop" tutorial forgets it.

ArgoCD's two independent status axes ¹⁵ deserve their own mention. Sync status: does the observed state match desired state? Health status: is the thing actually working? Collapsing these into a single column hides half the information. ArgoCD's aggregation rule -- the parent's health is the worst health among its children, evaluated by a defined priority ordering -- is the algorithm for rolling up task status to run status to project status. Define the enum once and apply a min() everywhere.

The CQRS framing ¹⁷ elevates all of this from naming conventions to a technical program. The write model is the command API -- verbs of intent like "start task" and "block task." The event store is the filesystem -- every write to a progress log is an append-only event. The read model is the UI, a materialized view optimized for query and never for write. Marten's live-versus-async projection split ⁷⁸ is the implementation lever: live projections are computed at query time (always fresh), async projections are maintained as a background index (fast but eventually consistent). Start with live, graduate to async when cardinality demands it.

Obsidian's graph view ⁷⁹ is worth noting because it is a projection that already runs over the same filesystem an agent operator writes to. Its graph -- one node per file, one edge per wikilink -- is a materialized view over markdown. Its color groups map directly to task status encoding. The question of whether an existing tool can serve as the initial demo surface answers itself: the plumbing already exists.

No surveyed agent UI ships a drift panel. No surveyed agent UI enforces the write-through refusal discipline. No surveyed agent UI displays two independent status axes. These are not novel inventions. They are decade-old patterns the agent-UX community has not yet adopted.

What Would This Look Like in Five Years?

Five endgame metaphors for agentic product development compete, and no single one wins ⁸⁰. They win at different scales.

At the turn level (one prompt, many branches), the multiplayer-document metaphor wins. Branching is CRDT-native, human and agents are peers in one room.

At the task level (one task, multiple subagents), the spreadsheet or reactive-notebook metaphor wins. The dependency DAG is the shape of the work, and parallelism falls out of topological sort.

At the workflow level (many tasks, many agents), the real-time strategy game metaphor wins. It is natively multi-agent, natively parallel, natively spatial, natively about supervising autonomous entities. Screeps ⁸¹ already demonstrates that programmers will write JavaScript to supervise a persistent colony of autonomous units running 24/7.

At the kernel level (what crashes, what restarts, who owns what), OTP supervision trees are the correct shape.

At the 2030 endgame, Dynamicland's room-scale computational public space ⁸² is the right North Star -- the room is the computer, programs are physical objects, collaboration is default, spatial memory is load-bearing. But it is not a two-week demo.

The "no chat at all" endgame has at least seven concrete replacements for the prompt box ⁸⁰. The command palette -- Cmd-K as a verb catalog: /fork, /reconcile, /pause, /rewind to turn N ⁸³. Direct manipulation of the state file -- edit workflow-state.json in your editor; the operator detects drift and reconciles. Voice -- Whisper plus intent routing plus the state ledger as execution surface; voice is the one input channel that does not serialize through fingers. The file you are looking at as the prompt -- Ink & Switch's Embark pattern ⁸⁴: write a new task line in tasks.md, the agent notices drift, claims the task, reconciles; there is no "submit." Ambient notification -- Rich Presence plus tray icon plus a daily digest and nothing else; Jason Yuan's Dot/new.computer pitch ⁸⁵ applied to operators. Calendar as agent log -- every task-run is a calendar event, color-coded using Airflow's palette; time is already the primary axis of work. Audio as ambient status -- Peep, the Network Auralizer from USENIX 2000 ⁸⁶, mapped network events to birdsong; the agent version makes the reconciliation heartbeat a soft metronome tick, a task transition a chime, a circuit breaker opening a heavy door closing.

Three North Star renders for the 2030 demo:

The War Room. A hex-and-counter wargame map of the vault. Agents as counters, tasks as hexes, fog-of-war over unexplored parts of the codebase. Teammates on the other side of the table supervising their own theaters.

The Atelier. Ink & Switch malleable documents with Peritext/Automerge CRDT substrate, multiplayer cursors, no "agent panel" because the agent is in the document. The Geoffrey Litt ⁸⁷ and Maggie Appleton ⁸⁸ endgame where home-cooked software meets malleable software.

The Command Deck. The bridge of a ship. Subagents as NPCs carrying out orders. Commands by voice or touch. The one metaphor simultaneously spatial, multi-agent, voice-first, presence-first, and gamer-legible.

The operator's unfair advantage in any of these scenarios is the substrate choice. The vault is a filesystem of pure artifacts ¹. That is the substrate every 2030-shaped surface wants. Malleable software wants it. Peritext wants it. Dynamicland's Realtalk wants it. Chat UIs actively fight against it by storing state in an opaque, non-addressable, non-queryable blob. The 2030 question is not "what replaces the prompt box" but how many projections over the same event log the supervisor wants today. The projections are cheap. The substrate is ready.

What We Do Not Know Yet

The research does not resolve the following.

Merge semantics for prose remain unsettled. The Claude Code spec, Forky, and ContextBranch each name the problem -- summary injection, semantic three-way merge, inject -- but none ship a production merge algorithm for natural-language branches ²⁴²⁵⁸⁹. Whether this is solvable, or whether merge of prose is fundamentally lossy at a level no algorithm can bridge, has direct implications for any branching agent UI.

Drift detection applied to agent-produced artifacts has no shipped implementation. Terraform and Flux ship drift detection for infrastructure state ¹⁶⁹⁰, but no agent UI applies the pattern to agent outputs: "the human edited progress.md manually; the state ledger does not know; the drift is invisible." Whether the reconciliation heartbeat and drift panel are worth the build cost depends on how frequently agents and humans both touch the same artifacts, which is an empirical question.

Trust calibration for multi-agent supervision lacks any shipped widget. Lee and See's framework ⁷⁴ predicts that humans will both overtrust and undertrust in the absence of per-agent reliability data. No surveyed agent UI displays per-agent success rates or historical reliability as a first-class primitive. Whether a reliability-history dashboard actually improves trust calibration in practice is a testable HCI claim that has not been tested.

Whether spatial is a serious direction or an aesthetic gesture is unresolved. The research surfaced strong theoretical arguments for spatial memory ³⁷⁸²⁴⁹ and strong practical arguments that spatial interfaces are expensive to build relative to list-based alternatives. The answer likely depends on agent count: spatial memory matters more as the number of supervised entities grows, but the breakpoint is unknown.

How to evaluate which projection surface to build first is not answered. The territory has been mapped but the map does not include a selection criterion. The calibration variables: build cost (Rich Presence is two days; an Activity SDK dashboard is two weeks; a CRDT-backed Atelier is four weeks), audience legibility (calendar requires zero explanation; a supervision tree requires a one-minute preamble), and alignment with the team's existing vocabulary (Discord primitives require no translation for the target audience; OTP supervision vocabulary requires translation for anyone who has not written Erlang).

Sources

Operator visualization framework grilling session. grill/2026-04-11-operator-visualization-framework.md. Vault-local reference establishing the "state, not stream" premise. ↩ ↩²
Prior research on agentic interaction design systems. kampus/agentic-interaction-design-systems/research.md. HCI supervision frameworks, chat-as-failure-mode, role-native mediums, Endsley/Sheridan/Lee-See citations. ↩
Surveyed projects: ebibibi/claude-code-discord-bridge (https://github.com/ebibibi/claude-code-discord-bridge), thcapp/claude-discord-bridge (https://github.com/thcapp/claude-discord-bridge), DoBuDevel/discord-agent-bridge (https://github.com/DoBuDevel/discord-agent-bridge), llmcord (https://github.com/jakobdylanc/llmcord), OoriData/Discord-AI-Agent (https://github.com/OoriData/Discord-AI-Agent), and others. ↩ ↩²
Rich Presence -- Discord Developer Portal. https://docs.discord.com/developers/platform/rich-presence ; Setting Rich Presence -- Discord Social SDK. https://docs.discord.com/developers/discord-social-sdk/development-guides/setting-rich-presence ; discordpp::Activity Class Reference. https://discord.com/developers/docs/social-sdk/classdiscordpp_1_1Activity.html ↩
Discord Presence VSCode extensions: LeonardSSH/vscord (https://marketplace.visualstudio.com/items?itemName=LeonardSSH.vscord), iCrawl/discord-vscode (https://github.com/iCrawl/discord-vscode). ↩
Forum Channels FAQ. https://support.discord.com/hc/en-us/articles/6208479917079-Forum-Channels-FAQ ; Forum Channels blog. https://discord.com/blog/forum-channels-space-for-organized-conversation ↩
TechCrunch: Discord adds Reddit-like Forum channels (Sept 2022). https://techcrunch.com/2022/09/14/discord-forum-channels/ ↩
Stage Channels FAQ. https://support.discord.com/hc/en-us/articles/1500005513722-Stage-Channels-FAQ ; Running and Moderating Discord Stages best practices. https://discord.com/blog/running-moderating-discord-stages-best-practices ↩
Discord Game Overlay 101. https://support.discord.com/hc/en-us/articles/217659737-Game-Overlay-101 ; Mobile Voice Overlay (Android). https://support.discord.com/hc/en-us/articles/360042693171-Mobile-Voice-Overlay-Android ↩
Activities Overview. https://discord.com/developers/docs/activities/overview ; Discord Embedded App SDK (GitHub). https://github.com/discord/embedded-app-sdk ; Colyseus Discord Embedded SDK blog. https://colyseus.io/blog/discord-embedded-sdk/ ↩
Listening Along with Spotify. https://support.discord.com/hc/en-us/articles/115003966072-Listening-Along-with-Spotify ↩
Midjourney is Leaving Discord (Medium). https://medium.com/@nicktheaiguru/midjourney-is-leaving-discord-9b2072c7a902 ↩
Decrypt: Clyde's Last Call (Dec 2023). https://decrypt.co/206528/clydes-last-call-discords-ai-chatbot-being-shut-down-on-december-1 ; Clyde chatbot (Discord Wiki). https://discord.fandom.com/wiki/Clyde_(chatbot) ↩
Kubernetes Controllers (official). https://kubernetes.io/docs/concepts/architecture/controller/ ↩
OneUptime: ArgoCD Sync Status vs Health Status. https://oneuptime.com/blog/post/2026-02-26-argocd-sync-status-vs-health-status/view ; ArgoCD Resource Health docs. https://argo-cd.readthedocs.io/en/stable/operator-manual/health/ ↩ ↩²
Terraform plan command reference. https://developer.hashicorp.com/terraform/cli/commands/plan ; Terraform Plan Made Simple. https://controlmonkey.io/resource/terraform-plan-made-simple/ ↩ ↩²
CQRS Pattern (Azure). https://learn.microsoft.com/en-us/azure/architecture/patterns/cqrs ↩ ↩²
React DevTools. https://react.dev/learn/react-developer-tools ; Flutter Inspector. https://docs.flutter.dev/tools/devtools/inspector ↩
Ferd, The Zen of Erlang. https://ferd.ca/the-zen-of-erlang.html ; OTP Supervision Tree Patterns. https://medium.com/@kanishks772/the-supervision-tree-patterns-that-make-systems-bulletproof-356199f178bb ↩ ↩²
Observer docs (erlang.org). https://www.erlang.org/doc/apps/observer/observer_ug.html ; observer_cli (GitHub). https://github.com/zhongwencool/observer_cli ↩
Loom: interface to the multiverse (Janus, generative.ink). https://generative.ink/posts/loom-interface-to-the-multiverse/ ↩
socketteer/loom (GitHub). https://github.com/socketteer/loom ; cyborgism wiki: Loom. https://cyborgism.wiki/hypha/loom ↩ ↩²
LibreChat fork feature docs. https://www.librechat.ai/docs/features/fork ↩ ↩²
Claude Code issue #32631: Conversation Branching full spec. https://github.com/anthropics/claude-code/issues/32631 ↩ ↩² ↩³
Forky: git-style LLM history with semantic three-way merge. https://ishan.rs/posts/forky-git-style-llm-history ↩ ↩²
Cursor checkpoints docs. https://cursor.com/docs/agent/chat/checkpoints ; Cursor restore-UI confusion forum thread. https://forum.cursor.com/t/ux-ui-confusion-on-restoring-checkpoints/67614 ↩ ↩²
Marimo dataflow docs. https://docs.marimo.io/guides/editor_features/dataflow/ ; Observable reactive dataflow. https://observablehq.com/@observablehq/reactive-dataflow ↩
Hex 2.0 Reactivity, Graphs, and a little bit of Magic. https://hex.tech/blog/hex-two-point-oh/ ; Hex 3.0 parallel execution changelog. https://learn.hex.tech/changelog/2023-10-05 ↩
Figma: Multiplayer Editing in Figma. https://www.figma.com/blog/multiplayer-editing-in-figma/ ; Building Figma Multiplayer Cursors (Mark Skelton). https://mskelton.dev/blog/building-figma-multiplayer-cursors ↩
Liveblocks Presence guide. https://liveblocks.io/docs/guides/how-to-use-liveblocks-presence-with-react ↩
PartyKit Facepile review (Matt Webb). https://interconnected.org/more/2023/partykit/facepiles.html ↩
Figma cursor chat. https://help.figma.com/hc/en-us/articles/4403130802199-Use-cursor-chat-in-Figma-Design ↩
W3C Design Tokens Format Module 2025.10. https://www.designtokens.org/tr/drafts/format/ ; W3C Design Tokens Community Group. https://www.designtokens.org ↩
Vercel json-render (InfoQ). https://www.infoq.com/news/2026/03/vercel-json-render/ ; Vercel AI SDK 3 Generative UI. https://vercel.com/blog/ai-sdk-3-generative-ui ↩
shadcn/ui. https://ui.shadcn.com ↩
Radix Primitives. https://www.radix-ui.com/primitives ↩
NN/G: Spatial Memory: Why It Matters for UX Design. https://www.nngroup.com/articles/spatial-memory/ ; Scarr, Cockburn, Bateman: Understanding and Exploiting Spatial Memory (Canterbury). https://ir.canterbury.ac.nz/handle/10092/9326 ↩ ↩²
Art of Styleframe: Dashboard Design Patterns for Modern Web Apps 2026. https://artofstyleframe.com/blog/dashboard-design-patterns-web-apps/ ↩
Datadog: Selecting the right colors for your graphs. https://docs.datadoghq.com/dashboards/guide/widget_colors/ ; Datadog: Understanding Duplicate Colors in the Consistent Palette. https://docs.datadoghq.com/dashboards/guide/consistent_color_palette/ ↩
Wathan, A. & Schoger, S. Refactoring UI. https://refactoringui.com ↩
Material Design 3: Motion overview. https://m3.material.io/styles/motion/overview/how-it-works ↩
NN/G: Skeleton Screens 101. https://www.nngroup.com/articles/skeleton-screens/ ↩
Storybook MCP server. https://storybook.js.org/docs/ai/mcp/overview ; Storybook Manifests. https://storybook.js.org/docs/ai/manifests ; Storybook MCP for React announcement. https://storybook.js.org/blog/storybook-mcp-for-react/ ↩
Karri Saarinen: Why is quality so rare? (Linear, Config 2025). https://linear.app/now/why-is-quality-so-rare ; Design Is Search (AI Creators Media). https://en.ai-creators.tech/media/creative/design-search/ ↩
Design Engineering at Vercel. https://vercel.com/blog/design-engineering-at-vercel ↩
Rauno Freiberg: Web Interface Guidelines. https://interfaces.rauno.me ; Craft / Novelty. https://rauno.me/craft/novelty ; Devouring Details. https://devouringdetails.com ↩
Emil Kowalski. https://emilkowal.ski ; sonner. https://sonner.emilkowal.ski ; vaul. https://vaul.emilkowal.ski ; Animations on the Web. https://animations.dev ↩
Josh Comeau: Springs and Bounces in Native CSS. https://www.joshwcomeau.com/animation/linear-timing-function/ ; An Interactive Guide to CSS Transitions. https://www.joshwcomeau.com/animation/css-transitions/ ↩
Bret Victor. https://worrydream.com ; Inventing on Principle transcript. https://jamesclear.com/great-speeches/inventing-on-principle-by-bret-victor ↩ ↩²
Temporal Web UI docs. https://docs.temporal.io/web-ui ; Temporal Events reference. https://docs.temporal.io/references/events ; Temporal Updated Event History Timeline View. https://temporal.io/change-log/updated-event-history-timeline-view-is-now-available ↩ ↩²
Airflow UI Overview. https://airflow.apache.org/docs/apache-airflow/stable/ui.html ; Airflow utils/state source. https://airflow.apache.org/docs/apache-airflow/1.10.3/_modules/airflow/utils/state.html ↩ ↩² ↩³
Dagster Asset and Run Visualization (DeepWiki). https://deepwiki.com/dagster-io/dagster/7.4-run-and-event-interfaces ; Dagster Column-level lineage. https://docs.dagster.io/guides/build/assets/metadata-and-tags/column-level-lineage ↩
Prefect v3 States concepts. https://docs.prefect.io/v3/concepts/states ↩ ↩²
Kestra Platform overview. https://kestra.io/overview ; Kestra (DeepWiki). https://deepwiki.com/kestra-io/kestra ↩
Argo Workflows Suspending. https://argo-workflows.readthedocs.io/en/latest/walk-through/suspending/ ; Dynamic Fan-out/Fan-in in Argo Workflows. https://medium.com/@corvin/dynamic-fan-out-and-fan-in-in-argo-workflows-d731e144e2fd ↩
Cursor changelog 3.0. https://cursor.com/changelog/3-0 ; Cursor 3 agent-first interface (The Decoder). https://the-decoder.com/new-cursor-3-ditches-the-classic-ide-layout-for-an-agent-first-interface-built-around-parallel-ai-fleets/ ↩ ↩² ↩³
Replit Agent 4 landing. https://replit.com/agent4 ; Introducing Agent 4 (Replit blog). https://blog.replit.com/introducing-agent-4-built-for-creativity ↩ ↩²
Cognition Devin 2.0 blog. https://cognition.ai/blog/devin-2 ; Devin release notes. https://docs.devin.ai/release-notes/overview ; Devin can now Schedule Devins. https://cognition.ai/blog/devin-can-now-schedule-devins ↩ ↩² ↩³
LangSmith Observability. https://www.langchain.com/langsmith/observability ; Langfuse Observability overview. https://langfuse.com/docs/observability/overview ; Langfuse Sessions. https://langfuse.com/docs/observability/features/sessions ↩
LangGraph time-travel docs. https://docs.langchain.com/oss/python/langgraph/use-time-travel ; LangGraph Studio blog. https://blog.langchain.com/langgraph-studio-the-first-agent-ide/ ↩ ↩²
Endsley, M.R. (1995). Toward a theory of situation awareness in dynamic systems. Human Factors, 37(1), 32-64. ; Endsley, M.R. & Kiris, E.O. (1995). The out-of-the-loop performance problem and level of control in automation. Human Factors, 37(2), 381-394. ↩ ↩² ↩³
Wickens, C.D. (2018). Automation stages & levels, 20 years after. Journal of Cognitive Engineering and Decision Making, 12(1), 35-41. ↩
Automerge (GitHub). https://github.com/automerge/automerge ; Automerge CRDTs concept docs. https://www.mintlify.com/automerge/automerge/concepts/crdts ↩ ↩²
Yjs (GitHub). https://github.com/yjs/yjs ↩ ↩²
Liveblocks Yjs sync engine. https://liveblocks.io/docs/ready-made-features/multiplayer/sync-engine/liveblocks-yjs ↩
Jazz.tools. https://jazz.tools/ ; Jazz CoValues concepts. https://jazz.tools/docs/react-native/core-concepts/covalues/overview ↩
Ink & Switch Dispatch 004: Peritext. https://www.inkandswitch.com/newsletter/dispatch-004/ ; Ink & Switch Peritext. https://www.inkandswitch.com/peritext/ ↩
Towards universal version control with Patchwork (Geoffrey Litt). https://buttondown.com/geoffreylitt/archive/towards-universal-version-control-with-patchwork/ ↩
Ferd, The Zen of Erlang. https://ferd.ca/the-zen-of-erlang.html ↩
Understand time-travel debugging in Redux. https://app.studyraid.com/en/read/12414/400817/time-travel-debugging-in-redux ; Redux DevTools tips and tricks (LogRocket). https://blog.logrocket.com/redux-devtools-tips-tricks-for-faster-debugging/ ↩
Replay.io: The MCP time travel debugger. https://www.replay.io/ ; Introduction to time travel debugging (Replay.io blog). https://blog.replay.io/introduction-to-time-travel-debugging ↩
IBM: What Is a Digital Twin? https://www.ibm.com/think/topics/digital-twin ; Volvo: Digital twins -- the ultimate virtual proving ground. https://www.volvoautonomoussolutions.com/en-en/news-and-insights/insights/articles/2025/jun/digital-twins--the-ultimate-virtual-proving-ground.html ↩
Sheridan, T.B. & Verplank, W.L. (1978). Human and Computer Control of Undersea Teleoperators. MIT Man-Machine Systems Laboratory. ; Supervisory control (Wikipedia). https://en.wikipedia.org/wiki/Supervisory_control ↩ ↩²
Lee, J.D. & See, K.A. (2004). Trust in automation: Designing for appropriate reliance. Human Factors, 46(1), 50-80. https://pubmed.ncbi.nlm.nih.gov/15151155/ ↩ ↩² ↩³
National Academies: Situation Awareness in Human-AI Teams. https://www.nationalacademies.org/read/26355/chapter/6 ; Team SA (Salmon et al., ResearchGate). https://www.researchgate.net/figure/Team-situation-awareness-Salmon-et-al-2008-adapted-from-Endsley-1995b_fig1_287390257 ↩
k9s Hotkeys docs. https://k9scli.io/topics/hotkeys/ ; The Complete K9s Cheatsheet. https://ahmedjama.com/blog/2025/09/the-complete-k9s-cheatsheet/ ↩
Level Triggering and Reconciliation in Kubernetes (HackerNoon). https://hackernoon.com/level-triggering-and-reconciliation-in-kubernetes-1f17fe30333d ; Reconciliation Loop (kubebuilder). https://deepwiki.com/kubernetes-sigs/kubebuilder/5.2-reconciliation-loop ↩
Live projections for read models (Kurrent). https://www.kurrent.io/blog/live-projections-for-read-models-with-event-sourcing-and-cqrs ; Marten CQRS/ES (Code Magazine). https://www.codemag.com/Article/2209071/Event-Sourcing-and-CQRS-with-Marten ↩
Obsidian Graph View Help. https://help.obsidian.md/plugins/graph ↩
Speculative analysis synthesized from: Screeps (https://store.steampowered.com/app/464350/Screeps_World/), Dynamicland (https://dynamicland.org/), Ink & Switch Malleable Software (https://www.inkandswitch.com/essay/malleable-software/), Spatial Interfaces (Pasquale D'Silva, https://medium.com/elepath-exports/spatial-interfaces-886bccc5d1e9), and Bret Victor's Seeing Spaces (http://worrydream.com/SeeingSpaces/). ↩ ↩²
Screeps: World. https://store.steampowered.com/app/464350/Screeps_World/ ; bencbartlett/Overmind AI for Screeps. https://github.com/bencbartlett/Overmind ↩
Dynamicland. https://dynamicland.org/ ; Dynamicland FAQ 2024. https://dynamicland.org/2024/FAQ/ ; Dynamicland: Computational Public Space. https://dynamicland.org/2024/Computational_Public_Space/ ↩ ↩²
cmdk (Paco Coursey). https://cmdk.paco.me ↩
Ink & Switch Dispatch 001: On Embark and Lude. https://newsletter.inkandswitch.com/archive/dispatch-001-on-embark-and-lude/ ; Geoffrey Litt: Malleable software in the age of LLMs. https://www.geoffreylitt.com/2023/03/25/llm-end-user-programming.html ↩
Jason Yuan Design. https://jasonyuan.design/ ; Jason Yuan announcing Dot on LinkedIn. https://www.linkedin.com/posts/jasonyuandesign_announcing-our-first-product-dot-an-intelligent-activity-7125515579519631361--ofh ↩
Peep: The Network Auralizer -- USENIX LISA 2000. https://www.usenix.org/legacyurl/peep-network-auralizer-monitoring-your-network-sound ↩
Geoffrey Litt: Dynamic documents // LLMs + end-user programming. https://www.geoffreylitt.com/2022/11/23/dynamic-documents ; Codifying a ChatGPT workflow into a malleable GUI. https://www.geoffreylitt.com/2023/07/25/building-personal-tools-on-the-fly-with-llms ↩
Maggie Appleton: Home-Cooked Software and Barefoot Developers. https://maggieappleton.com/home-cooked-software ↩
ContextBranch: A Version Control Approach to Exploratory Programming (arXiv). https://arxiv.org/abs/2512.13914 ↩
Terraform drift explained. https://encore.cloud/resources/terraform-drift ; Flux Kustomization docs. https://fluxcd.io/flux/components/kustomize/kustomizations/ ↩

usirin/research-v2-mixed-audience.md

Select an option

No results found

Select an option

No results found

What This Document Is

What Does Our Own Product Already Have?

What Vocabulary Do Other Fields Use for This?

What Does Branching Actually Look Like as an Interface?

What Would a Design-Systems Team Specifically Bring to This?

How Do Other Tools Visualize Parallel Work?

What Concepts Have Not Been Borrowed Yet?

The State-Projection Contract (The Engineering Foundation)

What Would This Look Like in Five Years?

What We Do Not Know Yet

Sources

usirin/research-v2-mixed-audience.md

What This Document Is

What Does Our Own Product Already Have?

What Vocabulary Do Other Fields Use for This?

What Does Branching Actually Look Like as an Interface?

What Would a Design-Systems Team Specifically Bring to This?

How Do Other Tools Visualize Parallel Work?

What Concepts Have Not Been Borrowed Yet?

The State-Projection Contract (The Engineering Foundation)

What Would This Look Like in Five Years?

What We Do Not Know Yet

Sources

Footnotes