Skip to content

Instantly share code, notes, and snippets.

@odysseus0
Last active July 1, 2025 03:28
Show Gist options
  • Save odysseus0/e6c331d3f9d092b97576dc66c1727899 to your computer and use it in GitHub Desktop.
Save odysseus0/e6c331d3f9d092b97576dc66c1727899 to your computer and use it in GitHub Desktop.
Async Agent Architecture - Design and Exploration

Non-blocking Tasks for Claude Code - V1 Design

Overview

This design makes Claude Code more responsive by allowing the main agent to continue working while background tasks run. Currently, when Claude spawns multiple long-running tasks, everything freezes - the agent can't continue working on other things, and you can't interact with it. With this change, Claude can process results as they arrive, continue working on independent tasks, and remain available for user interaction.

Background: Understanding Claude Code

To understand the problem and solution, you need to know how Claude Code works today.

Two Key Systems

Claude Code elegantly solves the problem of complex coding tasks through two systems that work together:

  1. Todos - A task tracking system that helps Claude organize work. Think of it as Claude's smart to-do list. Todos have states that track progress: pending (not started) → in_progress (working on it) → completed (done).

  2. Task Tool - Allows Claude to spawn "sub-agents" - independent AI agents that handle complex, long-running work. Think of it as Claude delegating work to specialized assistants.

Why Tasks Exist

When you ask Claude to "analyze all error handling in the codebase," this might require:

  • Reading hundreds of files
  • Following complex code paths
  • Identifying patterns across modules
  • Synthesizing findings

Doing this in the main conversation would:

  • Clutter the context with intermediate steps
  • Make responses unwieldy
  • Use up the context window

Instead, Claude spawns a task (sub-agent) that:

  • Works in its own isolated context
  • Has full tool access (read files, search, etc.)
  • Explores extensively without affecting the main conversation
  • Returns only a clean summary of findings

This thoughtful architecture keeps the main conversation focused while enabling deep exploration - a key strength of Claude Code. Tasks typically take 1-2+ minutes to complete.

The Problem

What Happens Now

When Claude spawns multiple tasks, everything freezes:

User: Please analyze our database schema, research migration best practices, 
      and find performance bottlenecks.

Claude: I'll work on those 3 tasks...
        Task(description="Analyze current database schema")
        Task(description="Research migration best practices")
        Task(description="Find performance bottlenecks via query analysis")
[... 3+ minutes of complete silence ...]
Claude: Here are all the results:
        1. Database schema: 47 tables with foreign key relationships...
        2. Migration approach: Use versioned migrations with rollback...
        3. Bottlenecks: Missing indexes on orders.customer_id...

Why This Is Frustrating

  1. Agent is stuck - Claude can't work on anything else, even unrelated tasks
  2. Wasted early results - If one task finishes in 30 seconds, Claude still waits for all tasks
  3. No interaction - Can't ask questions or provide guidance while waiting
  4. Inefficient workflow - Claude can't use early results to inform other work or adjust approach

Technical Root Cause

Task is just another tool in Claude's toolkit - but while most tools complete in seconds, tasks take minutes. The current runtime treats all tools the same:

// Current behavior - wait for ALL tools to complete
async function processToolCalls(toolCalls: ToolCall[]) {
  return Promise.all(
    toolCalls.map(toolCall => executeTool(toolCall))
  )
  // Promise.all waits for EVERY tool to finish before returning ANY results
}

This makes sense for quick tools but creates the blocking problem for tasks:

  • Multiple tasks DO run in parallel (good!)
  • But Claude waits for ALL to complete (bad!)
  • If 3 tasks take [30s, 60s, 120s], you wait 120s for everything

Solution

Make the Task tool non-blocking. When a task is launched, it immediately creates a todo and returns, allowing Claude to continue working while the task runs in the background.

New Experience

User: Please analyze our database schema, research migration best practices,
      and find performance bottlenecks.

Claude: I'll work on those 3 areas.
        Task(description="Analyze current database schema", priority="high")
        Task(description="Research migration best practices", priority="medium")
        Task(description="Find performance bottlenecks via query analysis", priority="high")
        
        These tasks have implicit dependencies - bottleneck analysis needs 
        schema context. While they run, let me outline our migration approach...

[30 seconds later]
Claude: The migration best practices task just completed. I can see the 
        recommended pattern is versioned migrations. Let me document this 
        approach while waiting for the schema analysis...

[45 seconds later]  
Claude: Perfect! The schema analysis is done. We have 47 tables with complex
        relationships. This helps me understand what the bottleneck analysis
        will reveal. Let me start mapping table dependencies...

[90 seconds later]
Claude: The performance analysis found the issue - missing indexes on foreign
        keys in the orders table. Now I can create a comprehensive migration plan
        that addresses both the schema changes and performance improvements...

Total: 90 seconds of productive work vs 3+ minutes of frozen waiting.

Design Details

What Changes

  1. Task tool becomes non-blocking - Returns immediately after creating a todo
  2. Todos get one new state - resolved (task complete, results available)
  3. Task tool gets priority field - Two reasons: (1) Tasks create todos, and todos require a priority field - it needs to come from somewhere. (2) When multiple tasks resolve, priority helps Claude decide which results to review first

Data Model Updates

// Todo structure - gets one new state and field
interface Todo {
  content: string
  status: "pending" | "in_progress" | "resolved" | "completed"  // NEW: resolved
  priority: "high" | "medium" | "low"
  id: string
  result?: string  // NEW: holds task results when resolved
}

// Task tool - gets optional priority
interface TaskToolCall {
  tool: "Task"
  args: {
    description: string  // Brief summary shown in UI
    prompt: string      // Detailed instructions for sub-agent
    priority?: "high" | "medium" | "low"  // NEW: (1) Task creates todo,
                                         // todo needs priority field
                                         // (2) When multiple tasks resolve,
                                         // guides review order
  }
}

How It Works

  1. Claude spawns a task → Creates todo with status in_progress, returns immediately
  2. Task runs in background → Sub-agent works independently
  3. Claude continues → Can spawn more tasks, work on other things, or chat with user
  4. Task completes → Todo automatically updates to resolved with results
  5. Claude processes → During regular todo checks, sees resolved tasks and incorporates when ready

State Transitions

  • Manual todos: pendingin_progresscompleted (unchanged)
  • Task todos: in_progressresolvedcompleted

The resolved state means "task finished, results available for incorporation."

Implementation

Current Task Behavior

// Blocking - waits for task to complete
async function handleTaskTool(toolCall) {
  const result = await executeTask(toolCall)
  return result
}

New Non-blocking Behavior

async function handleTaskTool(toolCall) {
  // Create todo immediately
  const todo = {
    id: generateId(),
    content: toolCall.args.description,
    status: "in_progress",
    priority: toolCall.args.priority || "medium",
    result: null
  }
  addTodo(todo)
  
  // Execute in background - no await
  executeTask(toolCall).then(result => {
    updateTodo(todo.id, {
      status: "resolved",
      result: result
    })
  }).catch(error => {
    updateTodo(todo.id, {
      status: "resolved",
      result: `Error: ${error.message}`
    })
  })
  
  // Return immediately
  return {
    todoId: todo.id,
    message: "Task started asynchronously"
  }
}

Required Changes

  1. Runtime - Modify Task tool handler to be non-blocking
  2. Todo tools - Return new resolved state and result field
  3. Model prompting - Teach Claude to check for resolved tasks
  4. Testing - Ensure background updates work correctly

Benefits

  • Progressive results - Use outputs as they arrive instead of waiting
  • Continued work - Claude can work on other things while tasks run
  • Better coordination - Early results can inform other work
  • User interaction - Conversation continues during long operations
  • Smarter workflows - Claude can adapt based on intermediate results

Out of Scope

  • Making other tools non-blocking (they complete quickly)
  • Task cancellation
  • Task dependencies
  • Priority interrupts

Summary

This design solves a real user pain point with minimal changes. By making tasks non-blocking and tracking them through todos, we transform dead waiting time into productive work. The conversation stays dynamic, Claude remains responsive, and work gets done faster through intelligent parallelism.

The implementation is straightforward: change one tool's behavior and add one todo state. The impact is significant: a fundamentally better user experience.

We believe this focused approach balances implementation simplicity with meaningful user value.

Async Agents: Design Journey

The Problem

When you spawn three research tasks in Claude Code, each taking ~2 minutes, you wait in silence for all to complete. If one finishes in 30 seconds, you can't use those results until the slowest task finishes. After 6 minutes, all results arrive at once.

This blocks both the agent and user from making progress with completed work.

Early Solutions (Failed)

Async Runtime

First attempt: transplant async/await patterns from traditional programming. Tool calls would return futures. The agent would yield at suspension points.

Why it failed: LLMs generate operations through natural language, not predetermined code. You can't statically analyze what will happen. The "safety net" suspension point at message completion violated model agency principles.

Polling with Futures

Michael suggested Python futures with polling. The agent would create futures and check them with timeouts.

Why it failed: Polling fights async's nature. promise.wait() with timeouts would block the runtime.

Preemptive Scheduling

Treat it like an OS scheduler - runtime interrupts the agent and injects completed results.

Why it failed: You can't pause an LLM mid-inference. We'd have to kill and restart with updated context. The complexity wasn't justified.

Finding the Right Mental Model

Brain in a Vat

The agent isn't a process or scheduler. Better analogy:

  • LLM = brain
  • Tool use = sensory/motor interface
  • Runtime = connection to reality

We're designing information flow, not making the brain "async."

Fetch-Decode-Execute

The runtime operates like a simple CPU cycle:

  • Fetch: Get tool calls from model
  • Decode: Determine tool type
  • Execute: Run tool and wait (blocking)

For async, only execute changes:

  • Execute: Start operation, create todo, return immediately

This clarified we're just branching at execution, not redesigning the runtime.

The Todo Primitive

While discussing suspension points, the insight emerged: Claude Code already has async primitives - todos.

Todos are:

  • Natural task boundaries
  • Already used for task switching
  • Already visible to the model
  • Already a work queue

Initial Design

Add two new todo states:

  • async_running - Operation in progress
  • ready_to_incorporate - Results available

Plus a result field for outputs.

Simplification

"Why distinguish between sync and async in-progress work?"

Work is work. Whether actively processing or running in background, both are in_progress.

Final design: ONE new state:

  • resolved - Task complete, results available

From General to Specific

The design evolved to adding async: true flag to any tool call.

Then: "Do we need async Read? Async Grep?"

The Task tool already represents background work - it spawns sub-agents for long operations. Task was blocking when it shouldn't be.

Final approach:

  • No new flags or parameters
  • Make Task tool non-blocking by default
  • Other tools remain synchronous

Since we're modifying Task, add optional priority field to control importance of background work.

Final Design

  1. Task tool becomes non-blocking (returns immediately after creating todo)
  2. Todos get one new state: resolved (with result field)
  3. Task tool accepts optional priority parameter

The change fixes the mismatch between Task's concept (background work) and behavior (blocking).

The V2 Detour

Unification Attempt

After implementing V1, considered unifying tasks and todos as "units of work."

Proposed unified model:

interface Todo {
  content: string
  status: "pending" | "in_progress" | "resolved" | "completed"
  priority: "high" | "medium" | "low"
  result?: string
  instructions?: string  // When present, work is actionable
}

Implementation Issues

When a todo with instructions moves to in_progress, what happens?

Attempted solutions:

  1. Runtime prompts: "Execute locally or delegate?" - Runtime shouldn't make decisions
  2. Execution intent field: execution: "local" | "delegate" - Recreating Task with extra steps
  3. Special states: pending_delegation - States shouldn't carry execution semantics

Core problem: The model needs to communicate intent, not just state.

Return to V1

Task/Todo separation represents different intentions:

  • Task tool: "I'm delegating this work"
  • Todo tool: "I'm tracking this work"

The separation carries semantic information. Like "send email" vs "draft email" - both involve writing, but intent differs from creation.

Design Evolution

  1. Complex async runtime → Too foreign to LLM nature
  2. Polling with futures → Fighting async's grain
  3. Preemptive scheduling → Solving non-existent problems
  4. Async flag on all tools → Right idea, too broad
  5. Two new todo states → Good, but one redundant
  6. Task-only + resolved state → Correct

The progression was from "what can we add?" to "what's already there?"

Lessons

  1. Traditional patterns can mislead - LLM systems need LLM-native solutions
  2. Question distinctions - Each one eliminated makes the system cleaner
  3. Semantic alignment matters - When behavior matches concept, design works
  4. Enhance existing primitives - Better than creating new ones
  5. Trust model intelligence - Complex runtime mechanisms often unnecessary

Deferred Ideas

  • Hard interrupts: For when one task invalidates another
  • Dependency graphs: Explicit task relationships
  • Dedicated scheduler: Supervisor agent managing work
  • RL for granularity: Training models to chunk work naturally

Each adds complexity. Starting simple reveals what's actually needed versus imagined requirements.

The journey showed that clear intent beats theoretical elegance, and semantic differences deserve syntactic differences. Sometimes apparent redundancy carries important meaning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment