Skip to content

Instantly share code, notes, and snippets.

@bsima
Created November 8, 2025 13:24
Show Gist options
  • Select an option

  • Save bsima/e3e7c53550bde03d900fe350525fe326 to your computer and use it in GitHub Desktop.

Select an option

Save bsima/e3e7c53550bde03d900fe350525fe326 to your computer and use it in GitHub Desktop.
BEADS

Beads Issue Tracker: Comprehensive Design Analysis

Document Version: 1.0
Date: November 8, 2025
Purpose: Deep analysis of the beads issue tracker architecture, design decisions, and tradeoffs to inform alternative implementations.


Table of Contents

  1. Executive Summary
  2. Core Problem & Design Philosophy
  3. Architecture Overview
  4. Data Model
  5. ID Generation Strategy
  6. Storage Layer
  7. Synchronization Architecture
  8. Daemon & RPC System
  9. Git Integration
  10. Dependency System
  11. CLI Design
  12. Integration Patterns
  13. Advanced Features
  14. Design Tradeoffs
  15. Implementation Considerations

Executive Summary

Beads is a dependency-aware issue tracker designed specifically for AI coding agents. Its core innovation is making a distributed SQLite database feel like a centralized system by using git as the synchronization layer.

Key Design Pillars

  1. Git as Database Sync - JSONL files in git act as the "wire protocol" between local SQLite caches
  2. Offline-First - Hash-based IDs eliminate coordination requirements for concurrent issue creation
  3. Dependency-Aware - First-class support for blocking relationships and ready work detection
  4. Agent-Optimized - JSON output, programmatic APIs, automatic work discovery
  5. Zero Configuration - Auto-sync, auto-start daemon, auto-import on pull

The "Magic Trick"

┌─────────────────┐         ┌─────────────────┐         ┌─────────────────┐
│  Machine A      │         │  Git Remote     │         │  Machine B      │
│                 │         │                 │         │                 │
│  SQLite (cache) │◄─sync──►│  JSONL (truth)  │◄─sync──►│  SQLite (cache) │
│  .beads/*.db    │         │  .beads/*.jsonl │         │  .beads/*.db    │
│  (gitignored)   │         │  (git-tracked)  │         │  (gitignored)   │
└─────────────────┘         └─────────────────┘         └─────────────────┘

Users interact with fast local SQLite, but all changes automatically flow through git to appear on other machines. It feels centralized but is actually distributed.


Core Problem & Design Philosophy

The Problem Space

AI coding agents need to:

  • Track work items across multiple sessions (agent "memory")
  • Handle complex nested work (epics → features → tasks)
  • Discover new work during execution without forgetting it
  • Coordinate across multiple agents/branches/machines
  • Work offline with eventual consistency
  • Avoid context space pollution (no giant markdown files)

Traditional solutions fail:

  • Markdown TODOs: No structure, no dependencies, no queryability, context pollution
  • GitHub Issues: Requires network, complex API, not agent-friendly
  • Jira: Heavy, enterprise-focused, requires central server
  • Linear: Cloud-only, not designed for agents

Design Philosophy

Core Principle: "Feels centralized, actually distributed"

  • Simplicity over features - Does one thing well (issue tracking + dependencies)
  • Git-native - Leverages existing git workflows, no new infrastructure
  • Local-first - Fast queries (SQLite), no network latency
  • Eventual consistency - Accept git-style merge conflicts as tradeoff
  • Agent-first - JSON everywhere, clear error messages, discoverable commands
  • Extensible foundation - SQLite database can be extended by applications

Non-Goals:

  • Real-time collaboration (use Agent Mail optional addon for that)
  • Advanced project management (no sprints, burndown charts, etc.)
  • Multi-tenancy (one database per project/repo)
  • ACL/permissions (relies on git access control)

Architecture Overview

Three-Layer Design

┌────────────────────────────────────────────────────────┐
│                     CLI Layer                          │
│  (cmd/bd/)                                             │
│  - Cobra-based commands (create, list, update, etc.)  │
│  - JSON/human output formatting                        │
│  - Auto-sync orchestration                             │
│  - Git integration (hooks, merge drivers)              │
└────────────────────────────────────────────────────────┘
                           │
                           ▼
┌────────────────────────────────────────────────────────┐
│                     RPC Layer                          │
│  (internal/rpc/)                                       │
│  - Per-workspace daemon process                        │
│  - Unix domain sockets (Windows named pipes)           │
│  - Event-driven or polling mode                        │
│  - Background auto-sync (export → commit → push)       │
└────────────────────────────────────────────────────────┘
                           │
                           ▼
┌────────────────────────────────────────────────────────┐
│                   Storage Layer                        │
│  (internal/storage/)                                   │
│  - Interface-based abstraction                         │
│  - SQLite implementation (primary)                     │
│  - Memory backend (testing)                            │
│  - Extensible via UnderlyingDB()                       │
└────────────────────────────────────────────────────────┘

Per-Workspace Daemon Model (LSP-Style)

Beads uses a Language Server Protocol-inspired architecture:

MCP Server (one instance, optional)
    ↓
Per-Project Daemons (one per workspace)
    ↓
SQLite Databases (complete isolation)

Benefits:

  • Complete database isolation (no cross-project pollution)
  • Simpler mental model (one project = one daemon = one database)
  • Follows proven LSP architecture
  • Auto-starts on first command (no manual management)

Each daemon:

  • Lives at .beads/bd.sock (Unix) or .beads/bd.pipe (Windows)
  • Handles auto-sync with 500ms-5s debouncing
  • Watches for file changes (event-driven mode) or polls
  • Auto-restarts on version mismatch

Data Model

Core Entities

// Issue - The fundamental work item
type Issue struct {
    ID                 string    // Hash-based (bd-a3f2dd) or hierarchical (bd-a3f2dd.1)
    ContentHash        string    // SHA256 of canonical content (collision detection)
    Title              string    // Required, max 500 chars
    Description        string    // Markdown, arbitrary length
    Design             string    // Design notes (optional)
    AcceptanceCriteria string    // Acceptance criteria (optional)
    Notes              string    // Freeform notes
    Status             Status    // open, in_progress, blocked, closed
    Priority           int       // 0-4 (0=critical, 4=backlog)
    IssueType          IssueType // bug, feature, task, epic, chore
    Assignee           string    // Optional assignee
    EstimatedMinutes   *int      // Optional time estimate
    CreatedAt          time.Time
    UpdatedAt          time.Time
    ClosedAt           *time.Time
    ExternalRef        *string   // Link to external systems (gh-123, jira-ABC)
    CompactionLevel    int       // Memory decay level (0=uncompacted)
    CompactedAt        *time.Time
    OriginalSize       int       // Size before compaction
    SourceRepo         string    // Multi-repo support
    
    // Export-only fields (populated on read):
    Labels       []string
    Dependencies []*Dependency
    Comments     []*Comment
}

// Dependency - Relationship between issues
type Dependency struct {
    IssueID     string         // Issue that depends
    DependsOnID string         // Issue depended upon
    Type        DependencyType // blocks, related, parent-child, discovered-from
    CreatedAt   time.Time
    CreatedBy   string
}

// Four dependency types:
// - blocks: Hard blocker (affects ready work detection)
// - related: Soft relationship (no blocking)
// - parent-child: Hierarchical (epic → tasks)
// - discovered-from: Work discovered during execution (tracks context)

Status Invariants

Enforced at database level:

CHECK ((status = 'closed') = (closed_at IS NOT NULL))
  • Closed issues MUST have closed_at timestamp
  • Non-closed issues CANNOT have closed_at timestamp
  • Prevents inconsistent state

Content Hash for Collision Detection

Purpose: Detect when two issues with same ID have different content (collision or update)

Fields included:

  • title, description, design, acceptance_criteria, notes
  • status, priority, issue_type, assignee
  • external_ref (important: linkage to external systems is semantic)

Excludes: ID, timestamps, compaction metadata

Usage:

hash := issue.ComputeContentHash()
// Later during import:
if existingIssue.ContentHash != importedIssue.ContentHash {
    // Update operation (same ID, different content)
}

ID Generation Strategy

Evolution: Sequential → Hash-Based

Problem with Sequential IDs (v1.x):

Branch A: Creates bd-10, bd-11, bd-12
Branch B: Creates bd-10, bd-11, bd-12 (COLLISION!)
Merge: Requires complex collision resolution (~2,100 LOC)

Solution: Hash-Based IDs (v2.0+):

Branch A: Creates bd-a3f2dd, bd-b8e1c9, bd-f14c3a
Branch B: Creates bd-7d2e9f, bd-c5a4b2, bd-e9f7d1
Merge: Clean, no collisions!

Hash ID Format

Top-Level IDs:

Format: {prefix}-{6-8-char-hex}
Examples: bd-a3f2dd (6 chars, 97% of cases)
          bd-a3f2dda (7 chars, rare collision ~3%)
          bd-a3f2dda8 (8 chars, very rare)

Generation algorithm:

func GenerateHashID(prefix, title, description string, created time.Time, workspaceID string) string {
    h := sha256.New()
    h.Write([]byte(title))
    h.Write([]byte(description))
    h.Write([]byte(created.Format(time.RFC3339Nano))) // Nanosecond precision
    h.Write([]byte(workspaceID))                      // Prevents cross-workspace collisions
    hash := hex.EncodeToString(h.Sum(nil))
    return fmt.Sprintf("%s-%s", prefix, hash[:6])     // Start with 6 chars
}

Progressive collision handling:

  1. Try 6 characters
  2. If INSERT fails (UNIQUE constraint), try 7 characters from same hash
  3. If still fails, try 8 characters
  4. Result: ~97% of IDs are short (6 chars), edge cases get slightly longer

Hierarchical Child IDs

Problem: Hash IDs are less human-friendly than sequential (bd-1, bd-2)

Solution: Sequential children within hash-based parent namespace

bd-a3f2dd       [epic] Auth System (hash-based parent)
bd-a3f2dd.1     [task] Design login UI (sequential child)
bd-a3f2dd.2     [task] Backend validation
bd-a3f2dd.3     [epic] Password Reset (child epic)
bd-a3f2dd.3.1   [task] Email templates (grandchild)
bd-a3f2dd.3.2   [task] Reset flow tests

Benefits:

  • Parent hash ensures unique namespace (no collision coordination)
  • Children are human-friendly sequential numbers
  • Up to 3 levels of nesting (prevents over-decomposition)
  • Natural work breakdown structure

Database support:

CREATE TABLE child_counters (
    parent_id TEXT PRIMARY KEY,
    last_child INTEGER NOT NULL DEFAULT 0,
    FOREIGN KEY (parent_id) REFERENCES issues(id) ON DELETE CASCADE
);

ID Tradeoffs

Aspect Sequential (Old) Hash-Based (New)
Collision risk HIGH (offline work) NONE (top-level)
ID length 5-8 chars 9-11 chars (avg ~9)
Predictability Predictable (bd-1, bd-2) Unpredictable
Offline-first ❌ Requires coordination ✅ Fully offline
Merge conflicts ❌ Same ID, different content ✅ Different IDs
Human-friendly ✅ Easy to remember ⚠️ Harder (mitigated by hierarchical children)
Code complexity ~2,100 LOC collision resolution <100 LOC
Birthday paradox N/A 1% collision at 1,000 issues (6 chars)

Design Decision: Accept slightly longer IDs to eliminate distributed coordination complexity.


Storage Layer

Interface-Based Design

// storage.Storage - The abstraction
type Storage interface {
    // Issues
    CreateIssue(ctx context.Context, issue *Issue, actor string) error
    GetIssue(ctx context.Context, id string) (*Issue, error)
    UpdateIssue(ctx context.Context, id string, updates map[string]interface{}, actor string) error
    CloseIssue(ctx context.Context, id string, reason string, actor string) error
    SearchIssues(ctx context.Context, query string, filter IssueFilter) ([]*Issue, error)
    
    // Dependencies
    AddDependency(ctx context.Context, dep *Dependency, actor string) error
    GetDependencies(ctx context.Context, issueID string) ([]*Issue, error)
    DetectCycles(ctx context.Context) ([][]*Issue, error)
    
    // Ready Work (dependency-aware)
    GetReadyWork(ctx context.Context, filter WorkFilter) ([]*Issue, error)
    GetBlockedIssues(ctx context.Context) ([]*BlockedIssue, error)
    
    // Extensibility
    UnderlyingDB() *sql.DB  // Direct database access for extensions
}

SQLite Implementation

Why SQLite?

  • Zero configuration - No server, embedded
  • Fast - 100s of µs for queries, local disk I/O
  • Portable - Single file, cross-platform
  • Transactional - ACID guarantees
  • Extensible - Applications can add tables via UnderlyingDB()
  • Well-understood - Mature, stable, widely deployed
  • Single-writer - But daemon serializes writes anyway

Schema highlights:

-- Closed status invariant
CHECK ((status = 'closed') = (closed_at IS NOT NULL))

-- Foreign key cascades
FOREIGN KEY (issue_id) REFERENCES issues(id) ON DELETE CASCADE

-- Recursive CTE for transitive blocking
WITH RECURSIVE blocked_transitively AS (...)

-- Views for common queries
CREATE VIEW ready_issues AS SELECT ... WHERE NOT EXISTS (blocked)

Dirty Tracking for Incremental Export

Problem: Export all issues on every change is slow (1000 issues = 950ms)

Solution: Track dirty issues, export only changed ones

CREATE TABLE dirty_issues (
    issue_id TEXT PRIMARY KEY,
    marked_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP
);

-- Mark dirty on any change
INSERT INTO dirty_issues (issue_id) VALUES (?) ON CONFLICT DO NOTHING;

-- Export only dirty issues
SELECT * FROM issues WHERE id IN (SELECT issue_id FROM dirty_issues);

-- Clear after export
DELETE FROM dirty_issues WHERE issue_id IN (...);

Further optimization: Export hash tracking (bd-164)

CREATE TABLE export_hashes (
    issue_id TEXT PRIMARY KEY,
    content_hash TEXT NOT NULL,  -- Last exported content hash
    exported_at DATETIME NOT NULL
);

-- Only export if content changed
SELECT * FROM issues WHERE id IN (
    SELECT d.issue_id FROM dirty_issues d
    JOIN issues i ON d.issue_id = i.id
    LEFT JOIN export_hashes e ON i.id = e.issue_id
    WHERE e.content_hash IS NULL OR e.content_hash != i.content_hash
);

Result: Timestamp-only updates don't trigger re-export


Synchronization Architecture

The Distributed Database Pattern

┌─────────────────────────────────────────────────────┐
│                   Write Path                        │
│  CLI → SQLite → JSONL export → git commit → push   │
└─────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────┐
│                   Read Path                         │
│  git pull → JSONL import → SQLite → CLI             │
└─────────────────────────────────────────────────────┘

Auto-Sync Flow

After CRUD operations:

1. User: bd create "Fix bug" -p 1
2. CLI: Write to SQLite
3. CLI: Mark issue as dirty
4. Daemon: Detects dirty issues (event-driven or polling)
5. Daemon: Wait 500ms-5s (debounce window for batching)
6. Daemon: Export dirty issues to .beads/issues.jsonl
7. Daemon: git add + commit + push (if enabled)
8. Daemon: Clear dirty flags for exported issues

After git pull:

1. User: git pull
2. Git hook (post-merge): Trigger import
   OR Daemon: Detects .beads/issues.jsonl mtime change
3. Daemon/CLI: Check JSONL mtime vs last import
4. Daemon/CLI: Import JSONL to SQLite
5. Daemon/CLI: Update metadata table with import timestamp

Debouncing Strategy

Problem: Rapid changes cause export spam

bd create "Issue 1"  → Export after 5s
bd create "Issue 2"  → Export after 5s
bd create "Issue 3"  → Export after 5s
Result: 3 separate exports, 3 git commits (bad!)

Solution: Debouncing with batch window

bd create "Issue 1"  → Start 5s timer
bd create "Issue 2"  → Reset timer to 5s
bd create "Issue 3"  → Reset timer to 5s
... 5 seconds of silence ...
→ Export all 3 issues in one batch, one commit (good!)

Implementation:

type Debouncer struct {
    timer *time.Timer
    delay time.Duration  // 5 seconds
}

func (d *Debouncer) Trigger(fn func()) {
    if d.timer != nil {
        d.timer.Stop()
    }
    d.timer = time.AfterFunc(d.delay, fn)
}

Event-Driven vs Polling Modes

Polling Mode (default, stable):

Every 5 seconds:
  - Check if dirty issues exist → Export
  - Check JSONL mtime → Import if newer
  
CPU: ~2-3% (continuous polling)
Latency: ~5000ms worst case

Event-Driven Mode (experimental, v0.16+):

Platform-native file watching:
  - Linux: inotify
  - macOS: FSEvents
  - Windows: ReadDirectoryChangesW
  
Trigger on:
  - .beads/issues.jsonl modification (git pull)
  - .git/refs/heads updates (git operations)
  - RPC mutations (bd create/update/close)
  
CPU: ~0.5% (idle, event-driven)
Latency: <500ms

Tradeoff: Event-driven is faster but requires native filesystem support (fails on NFS, SMB)

JSONL Format

Why JSONL (not JSON array)?

  • Streamable - Can process line-by-line
  • Append-friendly - Add issues without parsing entire file
  • Merge-friendly - Git merges work line-by-line
  • Diffable - Git diffs show per-issue changes
  • Resilient - Corrupted line doesn't break entire file
  • Not JSON-parseable - Requires JSONL-specific parser

Format:

{"id":"bd-a3f2dd","title":"Fix auth bug","status":"open","priority":1,...}
{"id":"bd-b8e1c9","title":"Add feature","status":"closed","priority":2,...}

Each line:

  • Self-contained issue record
  • Includes embedded dependencies, labels, comments (denormalized)
  • Sorted by ID for consistent diffs
  • RFC3339 timestamps

Intelligent Merge Driver

Problem: Git line-based merging fails for JSONL

Base:   {"id":"bd-123","title":"Fix bug","priority":1}
Ours:   {"id":"bd-123","title":"Fix bug","priority":0}
Theirs: {"id":"bd-123","title":"Fix auth bug","priority":1}
Result: Conflict (line-based merge sees entire line changed)

Solution: Field-level 3-way merging (beads-merge algorithm)

Base:   {"priority":1, "title":"Fix bug"}
Ours:   {"priority":0, "title":"Fix bug"}        (changed priority)
Theirs: {"priority":1, "title":"Fix auth bug"}   (changed title)
Result: {"priority":0, "title":"Fix auth bug"}   (both changes merged!)

Auto-configured during bd init:

git config merge.beads.driver "bd merge %A %O %L %R"
git config merge.beads.name "bd JSONL merge driver"
echo ".beads/beads.jsonl merge=beads" >> .gitattributes

Merge rules:

  • Timestamps → max value
  • Dependencies/labels → union (combine both)
  • Status/priority → 3-way merge (detect conflicts)
  • Conflict markers only for true semantic conflicts

Daemon & RPC System

Why a Daemon?

Without daemon (direct mode):

bd create "Fix bug" -p 1
  ↓ Open SQLite connection
  ↓ INSERT issue
  ↓ Close connection
  ↓ Manual: bd export
  ↓ Manual: git add + commit + push

With daemon:

bd create "Fix bug" -p 1
  ↓ RPC call to daemon
  ↓ Daemon INSERT issue
  ↓ Daemon auto-exports after 5s
  ↓ Daemon auto-commits & pushes
  ↓ All automatic!

Daemon benefits:

  • Automatic sync - No manual export/import/commit
  • Batching - Multiple operations debounced into one export
  • Background work - Sync happens asynchronously
  • Connection pooling - Persistent SQLite connection
  • Event watching - React to git pulls immediately

RPC Protocol

Transport: Unix domain sockets (.beads/bd.sock) or Windows named pipes (.beads/bd.pipe)

Protocol: JSON over length-prefixed messages

[4-byte length prefix][JSON payload]

Request:

{
  "operation": "create",
  "args": {"title": "Fix bug", "priority": 1},
  "actor": "alice",
  "client_version": "0.21.0",
  "expected_db": "/path/to/.beads/beads.db"
}

Response:

{
  "success": true,
  "data": {"id": "bd-a3f2dd", "title": "Fix bug", ...}
}

Daemon Lifecycle

Auto-start (default):

bd ready
  ↓ Check for .beads/bd.sock
  ↓ If not exists, spawn daemon
  ↓ Wait for socket (exponential backoff)
  ↓ Connect and send request

Version checking:

1. Client sends client_version in request
2. Daemon compares to own version
3. If mismatch:
   - Daemon logs warning
   - Daemon shuts down gracefully
   - Client auto-starts new daemon with correct version

Graceful shutdown:

SIGTERM/SIGINT:
  ↓ Flush dirty issues immediately
  ↓ Export to JSONL
  ↓ Commit (if auto-commit enabled)
  ↓ Close SQLite connection
  ↓ Remove socket file
  ↓ Exit

Direct Mode (--no-daemon)

When to use:

  • Git worktrees (daemon doesn't know which branch)
  • CI/CD (deterministic execution)
  • Testing (isolated runs)
  • Debugging (simpler callstack)

Tradeoff: No auto-sync, must manually call bd sync


Git Integration

Git Hooks

Automatic installation during bd init:

pre-commit:

#!/bin/bash
bd export --flush  # Bypass debounce, export immediately
git add .beads/issues.jsonl

post-merge:

#!/bin/bash
bd import -i .beads/issues.jsonl  # Import after pull/merge

post-checkout:

#!/bin/bash
bd import -i .beads/issues.jsonl  # Import after branch switch

pre-push:

#!/bin/bash
bd export --flush  # Ensure JSONL is fresh before push

Protected Branches

Problem: GitHub/GitLab protected branches block direct commits to main

Solution: Commit to separate branch (beads-metadata)

bd init --branch beads-metadata

Workflow:

1. bd create "Fix bug" -p 1
2. Daemon exports to .beads/issues.jsonl
3. Daemon: git checkout beads-metadata
4. Daemon: git add .beads/issues.jsonl
5. Daemon: git commit -m "Update issues"
6. Daemon: git push origin beads-metadata
7. User creates PR: beads-metadata → main
8. After PR merge: bd import sees updated JSONL

Git Worktrees (⚠️ Limitation)

Problem:

repo/
  .beads/beads.db          (shared by all worktrees!)
  worktree-1/ (branch A)
  worktree-2/ (branch B)

All worktrees share same .beads/ directory → Daemon doesn't know which branch to commit to!

Solution: Use --no-daemon flag in worktrees

export BEADS_NO_DAEMON=1
bd create "Fix bug" -p 1
bd sync  # Manual sync

Dependency System

Four Dependency Types

1. blocks (Hard Blocker):

bd-123 blocks bd-456
→ bd-456 cannot start until bd-123 is closed
→ bd-456 excluded from ready work

2. related (Soft Link):

bd-123 related bd-456
→ Informational only, no blocking
→ Useful for cross-references

3. parent-child (Hierarchical):

bd-a3f2dd (epic) parent-child bd-a3f2dd.1 (task)
→ Task is part of epic
→ Epic closure checks if all children closed
→ Blocking propagates: blocked epic → blocked children

4. discovered-from (Context Tracking):

bd-999 discovered-from bd-123
→ bd-999 was discovered while working on bd-123
→ Automatically inherits bd-123's source_repo
→ Preserves work context

Ready Work Detection

Algorithm:

WITH RECURSIVE
  -- Step 1: Find directly blocked issues
  blocked_directly AS (
    SELECT DISTINCT d.issue_id
    FROM dependencies d
    JOIN issues blocker ON d.depends_on_id = blocker.id
    WHERE d.type = 'blocks'
      AND blocker.status IN ('open', 'in_progress', 'blocked')
  ),
  -- Step 2: Propagate blocking through parent-child hierarchy
  blocked_transitively AS (
    SELECT issue_id FROM blocked_directly
    UNION ALL
    SELECT d.issue_id
    FROM blocked_transitively bt
    JOIN dependencies d ON d.depends_on_id = bt.issue_id
    WHERE d.type = 'parent-child'
  )
-- Step 3: Return unblocked issues
SELECT * FROM issues
WHERE status = 'open'
  AND id NOT IN (SELECT issue_id FROM blocked_transitively)

Key insight: Blocking propagates through parent-child relationships

Epic (blocked) → All child tasks are blocked
Task (blocked) → Subtasks are blocked

Cycle Detection

Algorithm: Recursive CTE with depth limit

WITH RECURSIVE dependency_paths AS (
  -- Base case: all edges
  SELECT issue_id, depends_on_id, 1 as depth,
         issue_id || '->' || depends_on_id as path
  FROM dependencies
  
  UNION ALL
  
  -- Recursive case: extend paths
  SELECT dp.issue_id, d.depends_on_id, dp.depth + 1,
         dp.path || '->' || d.depends_on_id
  FROM dependency_paths dp
  JOIN dependencies d ON dp.depends_on_id = d.issue_id
  WHERE dp.depth < 50  -- Prevent infinite recursion
)
-- Detect cycles: path returns to starting node
SELECT * FROM dependency_paths WHERE issue_id = depends_on_id

Important: Cycles are allowed but detected (not prevented). Design decision: trust users, detect issues.


CLI Design

Command Structure (Cobra)

bd
├── init           Initialize database
├── create         Create issue
├── update         Update issue fields
├── close          Close issue(s)
├── list           List issues with filters
├── show           Show issue details
├── ready          Show ready work
├── stale          Show stale issues
├── stats          Statistics
├── dep            Dependency management
│   ├── add        Add dependency
│   ├── remove     Remove dependency
│   ├── tree       Show dependency tree
│   └── cycles     Detect cycles
├── label          Label management
├── comment        Comment management
├── sync           Manual sync (export/import/commit/push)
├── export         Export to JSONL
├── import         Import from JSONL
├── migrate        Database migrations
├── daemon         Daemon management
│   ├── start      Start daemon
│   ├── stop       Stop daemon
│   └── status     Daemon status
├── daemons        Multi-daemon management
│   ├── list       List all daemons
│   ├── health     Health check
│   ├── logs       View logs
│   └── killall    Stop all daemons
└── ...

JSON-First Design

Every command supports --json:

bd create "Fix bug" -p 1 --json
# {"id":"bd-a3f2dd","title":"Fix bug","priority":1,...}

bd list --status open --json
# [{"id":"bd-a3f2dd",...}, {"id":"bd-b8e1c9",...}]

bd ready --json
# [{"id":"bd-f14c3a",...}]

Benefits for AI agents:

  • ✅ Parseable output (no regex needed)
  • ✅ Complete data (all fields)
  • ✅ Consistent schema
  • ✅ Pipe-friendly (bd ready --json | jq '.[0].id')

Human-Friendly Output

Without --json:

bd show bd-a3f2dd

bd-a3f2dd [bug] Fix authentication
  Status: in_progress
  Priority: P1 (high)
  Created: 2025-11-08 10:30:00
  Updated: 2025-11-08 14:45:00
  Assignee: alice
  
  Description:
  Users cannot log in after recent deployment...
  
  Dependencies (2):
    → Blocks: bd-b8e1c9 (Deploy hotfix)
    → Related: bd-f14c3a (Audit logging)
  
  Labels: backend, auth, urgent

Color coding:

  • Red: P0 (critical)
  • Yellow: P1 (high)
  • Default: P2 (medium)
  • Dim: P3-P4 (low/backlog)

Integration Patterns

MCP Server (Model Context Protocol)

Architecture:

Claude Desktop (or other MCP client)
    ↓
beads-mcp (Python package)
    ↓
Per-project daemons (.beads/bd.sock)
    ↓
SQLite databases (isolated per project)

Benefits over CLI:

  • ✅ Native function calls (not shell commands)
  • ✅ Automatic workspace detection
  • ✅ Better error handling
  • ✅ Type-safe parameters

Installation:

pip install beads-mcp

# Add to MCP config (e.g., Claude Desktop)
{
  "beads": {
    "command": "beads-mcp",
    "args": []
  }
}

Usage:

# AI agent can call:
mcp__beads__create(title="Fix bug", priority=1)
mcp__beads__ready()
mcp__beads__update(issue_id="bd-42", status="in_progress")

Agent Mail (Optional Real-Time Coordination)

Problem: Git sync has 2-5s latency → Two agents might grab same issue

Solution: Optional HTTP-based reservation system

Agent A: bd update bd-123 --status in_progress
  ↓ POST /api/reservations (Agent Mail server)
  ↓ Reserve bd-123 for Agent A (5ms)
  ✓ Success

Agent B: bd update bd-123 --status in_progress
  ↓ POST /api/reservations
  ✗ 409 Conflict: "bd-123 reserved by Agent A"

Benefits:

  • 20-50x latency reduction (<100ms vs 2-5s)
  • Collision prevention
  • Lightweight (<50MB memory)

Tradeoffs:

  • ✅ Requires external server (Python daemon)
  • ✅ Network dependency (graceful degradation if server down)
  • ✅ Git remains source of truth (Agent Mail is ephemeral coordination only)

Configuration:

export BEADS_AGENT_MAIL_URL=http://127.0.0.1:8765
export BEADS_AGENT_NAME=assistant-alpha
export BEADS_PROJECT_ID=my-project

Extensibility via UnderlyingDB()

Pattern: Applications add their own tables to beads database

import "github.com/steveyegge/beads"

store, err := beads.NewSQLiteStorage(dbPath)
db := store.UnderlyingDB()  // Direct *sql.DB access

// Create application-specific tables
db.Exec(`
  CREATE TABLE myapp_executions (
    id INTEGER PRIMARY KEY,
    issue_id TEXT NOT NULL,
    status TEXT,
    FOREIGN KEY (issue_id) REFERENCES issues(id) ON DELETE CASCADE
  )
`)

// Join across layers
db.Query(`
  SELECT i.id, i.title, e.status
  FROM issues i
  JOIN myapp_executions e ON i.id = e.issue_id
  WHERE i.status = 'in_progress'
`)

Example: VibeCoder uses beads for issue tracking + custom tables for execution state, checkpoints, logs


Advanced Features

Memory Decay (Compaction)

Problem: Old closed issues accumulate, pollute context

Solution: Semantic compaction (AI-driven summarization)

bd compact --analyze --json  # Get candidates (closed 30+ days)
# Agent reads full content, generates summary with LLM
bd compact --apply --id bd-42 --summary summary.txt  # Persist

What compaction does:

  • Snapshot original content (for restoration)
  • Replace description/design/notes with AI summary
  • Mark compaction level (1, 2, ...)
  • Track original size vs compressed size

Result: Closed issues become 1-2 sentence summaries, freeing context space

Duplicate Detection & Merging

Automatic detection:

bd duplicates
# Groups issues by content hash
# Suggests merge operations

Merge operation:

bd merge bd-42 bd-43 --into bd-41
# Closes bd-42, bd-43 with "Merged into bd-41"
# Migrates all dependencies to bd-41
# Updates text references across all issues

Multi-Repo Support

Problem: Large projects have multiple repositories (monorepo, microservices)

Solution: source_repo field + JSONL hydration

Planning repo (.beads/issues.jsonl):
  bd-1: source_repo="api"
  bd-2: source_repo="frontend"
  bd-3: source_repo="shared"

API repo (.beads/):
  Import bd-1 only (filter by source_repo="api")

Frontend repo (.beads/):
  Import bd-2 only (filter by source_repo="frontend")

Benefits:

  • ✅ Single source of truth (planning repo)
  • ✅ Per-repo databases (isolated context)
  • ✅ Cross-repo dependencies visible
  • ✅ Selective hydration (no pollution)

Design Tradeoffs

1. Git as Sync Layer

Pro:

  • ✅ Leverages existing infrastructure (everyone has git)
  • ✅ Free conflict resolution (git merge)
  • ✅ Free hosting (GitHub, GitLab)
  • ✅ Full history/audit trail (git log)
  • ✅ Offline-first (git is designed for this)

Con:

  • ❌ 2-5 second latency (git push/pull)
  • ❌ Merge conflicts possible (mitigated by hash IDs + intelligent merge driver)
  • ❌ Not real-time (mitigated by optional Agent Mail)
  • ❌ Requires git literacy (but target users are developers)

Alternative considered: Operational Transform (like Google Docs)

  • Would provide real-time sync
  • But adds enormous complexity (CRDTs, vector clocks, conflict-free replicated data types)
  • Overkill for issue tracker (eventual consistency is fine)

2. SQLite as Database

Pro:

  • ✅ Zero configuration (embedded, no server)
  • ✅ Fast local queries (<1ms for simple queries)
  • ✅ Transactional (ACID)
  • ✅ Portable (single file)
  • ✅ Extensible (UnderlyingDB() pattern)

Con:

  • ❌ Single writer (but daemon serializes anyway)
  • ❌ Not web-accessible (but daemon provides RPC)
  • ❌ File corruption risk (mitigated by git backup)

Alternative considered: PostgreSQL

  • Would provide multi-writer, network access
  • But requires server setup (violates "zero configuration")
  • Overhead not justified for single-user/small-team use case

3. Hash-Based IDs

Pro:

  • ✅ Eliminates distributed coordination (offline-first)
  • ✅ Merge-friendly (no ID collisions)
  • ✅ Removed ~2,100 LOC of collision resolution code

Con:

  • ❌ Less human-friendly (bd-a3f2dd vs bd-1)
  • ❌ Slightly longer (9-11 chars vs 5-8 chars)
  • ❌ Birthday paradox (1% collision at 1,000 issues with 6 chars)

Mitigation:

  • Hierarchical children provide friendly sequential IDs (bd-a3f2dd.1, .2, .3)
  • Progressive length scaling (6→7→8 chars on collision)
  • Collision detection with clear error messages

Alternative considered: UUIDs

  • Would eliminate collisions entirely (128-bit space)
  • But 36 characters is too long (bd-550e8400-e29b-41d4-a716-446655440000)
  • Hash approach is middle ground (collision-resistant + readable)

4. Daemon vs Direct Mode

Pro (daemon):

  • ✅ Automatic sync (no manual export/import)
  • ✅ Batching (efficient multi-operation commits)
  • ✅ Background work (non-blocking)

Con (daemon):

  • ❌ Complexity (process management, sockets, version checking)
  • ❌ Doesn't work with git worktrees
  • ❌ Debugging harder (async operations)

Design decision: Daemon is default (better UX), but --no-daemon available for edge cases

5. JSONL vs Database Replication

Pro (JSONL):

  • ✅ Human-readable (can inspect in text editor)
  • ✅ Git-friendly (line-based diffs)
  • ✅ Simple (no replication protocol)
  • ✅ Debuggable (can manually edit)

Con (JSONL):

  • ❌ Denormalized (dependencies embedded in each issue)
  • ❌ Not a standard format (requires JSONL parser)
  • ❌ Import overhead (parse entire file)

Alternative considered: SQLite replication (e.g., LiteFS)

  • Would be more efficient (binary protocol)
  • But requires custom infrastructure (defeats "use git" goal)
  • JSONL is "good enough" (1000 issues = ~950ms import)

6. Four Dependency Types

Pro:

  • ✅ Expressive (covers common patterns)
  • ✅ "discovered-from" is unique innovation (context tracking)
  • ✅ Hierarchical parent-child enables work breakdown

Con:

  • ❌ Complexity (users must understand four types)
  • ❌ "blocks" vs "parent-child" distinction subtle

Design decision: Accept learning curve for expressiveness (agents handle complexity well)

7. Eventual Consistency

Pro:

  • ✅ Enables offline work (no server required)
  • ✅ Simple mental model (git-like)
  • ✅ Scales to distributed teams

Con:

  • ❌ Potential for conflicts (same issue edited on two branches)
  • ❌ No real-time coordination (mitigated by Agent Mail)

Design decision: Eventual consistency is acceptable for issue tracking (not mission-critical)


Implementation Considerations

If Building a Beads-Like System

Core Decisions to Make:

  1. Sync Layer Choice

    • Git (like beads): Familiar, free, offline-first, 2-5s latency
    • Custom server: Real-time, more complex, requires hosting
    • Hybrid: Git + optional real-time layer (like beads + Agent Mail)
  2. ID Strategy

    • Hash-based: Offline-first, collision-resistant, less readable
    • Sequential: Human-friendly, requires coordination, collision-prone
    • UUIDs: No collisions, but very long (36 chars)
    • Hybrid: Hash parents + sequential children (like beads)
  3. Database Choice

    • SQLite: Zero config, single-file, embedded, single-writer
    • PostgreSQL: Multi-writer, network access, requires server
    • In-memory + file backup: Fast, simple, no SQL
  4. Dependency Model

    • Graph-based: Flexible, supports complex relationships, cycle detection needed
    • Tree-only: Simpler, no cycles, less expressive
    • Flat: No dependencies, simple, limited use cases
  5. CLI vs Library

    • CLI (like beads): Easy for agents, human-friendly, subprocess overhead
    • Library: Faster, type-safe, language-specific
    • Both: Library + CLI wrapper (best of both)
  6. Sync Automation

    • Manual (user calls sync): Simple, explicit, requires discipline
    • Daemon (like beads): Automatic, complex, better UX
    • Git hooks: Triggered automatically, simple, no background process

Minimum Viable Implementation:

1. Data model: Issue (ID, title, status, priority, created_at)
2. Storage: SQLite with single table
3. Sync: Manual export to JSONL + git commit
4. ID: Sequential (accept collisions for MVP)
5. CLI: create, list, update, close, sync
6. Dependencies: None (add later)

Growth Path:

MVP → Daemon → Dependencies → Hash IDs → Intelligent merge → Agent Mail

What to Copy from Beads:

  • ✅ JSONL format (git-friendly, human-readable)
  • ✅ Dirty tracking (incremental export)
  • ✅ Content hash (collision detection)
  • ✅ JSON output everywhere (agent-friendly)
  • ✅ Auto-sync pattern (better UX)

What to Reconsider:

  • ❓ Hash IDs (only needed if distributed work is common)
  • ❓ Daemon (adds complexity, only needed for auto-sync)
  • ❓ Four dependency types (start with just "blocks")
  • ❓ Compaction (only needed for long-lived projects)

Simplifications:

  • Drop multi-repo support (unless needed)
  • Drop Agent Mail (can add later if coordination is issue)
  • Drop intelligent merge driver (let users resolve conflicts manually)
  • Drop hierarchical children (use flat hash IDs)

Tech Stack Alternatives:

  • Python: SQLite + click CLI + GitPython
  • TypeScript: better-sqlite3 + Commander.js + simple-git
  • Rust: rusqlite + clap + git2-rs
  • Go: Like beads (modernc.org/sqlite + cobra + git commands)

Database Schema Essentials:

CREATE TABLE issues (
    id TEXT PRIMARY KEY,
    title TEXT NOT NULL,
    description TEXT,
    status TEXT NOT NULL,  -- open, in_progress, closed
    priority INTEGER NOT NULL,
    created_at DATETIME NOT NULL,
    updated_at DATETIME NOT NULL
);

CREATE TABLE dependencies (
    from_id TEXT NOT NULL,
    to_id TEXT NOT NULL,
    type TEXT NOT NULL,  -- blocks, related
    PRIMARY KEY (from_id, to_id),
    FOREIGN KEY (from_id) REFERENCES issues(id),
    FOREIGN KEY (to_id) REFERENCES issues(id)
);

CREATE TABLE dirty_issues (
    issue_id TEXT PRIMARY KEY,
    FOREIGN KEY (issue_id) REFERENCES issues(id)
);

JSONL Export:

import json
import sqlite3

def export_to_jsonl(db_path, output_path):
    conn = sqlite3.connect(db_path)
    cursor = conn.execute("SELECT issue_id FROM dirty_issues")
    dirty_ids = [row[0] for row in cursor.fetchall()]
    
    with open(output_path, 'a') as f:
        for issue_id in dirty_ids:
            issue = conn.execute("SELECT * FROM issues WHERE id = ?", (issue_id,)).fetchone()
            f.write(json.dumps(issue) + '\n')
    
    conn.execute("DELETE FROM dirty_issues")
    conn.commit()

Git Integration:

import subprocess

def sync():
    export_to_jsonl('.beads/beads.db', '.beads/issues.jsonl')
    subprocess.run(['git', 'add', '.beads/issues.jsonl'])
    subprocess.run(['git', 'commit', '-m', 'Update issues'])
    subprocess.run(['git', 'push'])

Conclusion

What Makes Beads Unique

  1. Git-native distributed database - Feels centralized, actually distributed
  2. Hash-based IDs - Offline-first without coordination overhead
  3. Dependency-aware ready work - Automatic detection of unblocked issues
  4. Agent-optimized - JSON everywhere, discovered-from links, auto-sync
  5. Zero configuration - Works out of the box, auto-starts daemon, auto-syncs

Core Innovation

Making SQLite feel like a shared database using git as the replication layer.

This is the "magic trick" that makes beads work:

  • Fast local queries (SQLite)
  • Shared state across machines (git)
  • Automatic synchronization (daemon + JSONL export/import)
  • Familiar workflow (git push/pull)

When to Use Beads Architecture

Good fit:

  • ✅ Small-to-medium teams (1-20 people)
  • ✅ AI coding agents (primary use case)
  • ✅ Offline-first workflows (airplane coding)
  • ✅ Git-centric teams (already using git for everything)
  • ✅ Dependency-aware task tracking (blockers matter)

Poor fit:

  • ❌ Large teams (>50 people, too many merge conflicts)
  • ❌ Real-time collaboration (use operational transform instead)
  • ❌ Non-technical users (requires git knowledge)
  • ❌ Web-based access (SQLite not web-accessible)

Lessons Learned from Beads

  1. Simple beats complex - JSONL + git is simpler than custom replication
  2. Leverage existing tools - git, SQLite, Unix sockets (don't reinvent)
  3. Offline-first is powerful - Hash IDs enable true offline work
  4. Daemon pattern is valuable - Auto-sync dramatically improves UX
  5. Eventual consistency is OK - For issue tracking, merge conflicts are rare
  6. Extensibility matters - UnderlyingDB() pattern allows app-specific tables
  7. Agent-first design - JSON output, clear errors, discoverable commands

Final Thoughts

Beads demonstrates that distributed systems don't require complex protocols. By choosing git as the sync layer and accepting eventual consistency, beads achieves the benefits of a centralized database (shared state, queryability) with the benefits of distributed systems (offline work, no single point of failure).

The key insight: Use the right tool for each layer

  • SQLite for fast local queries
  • Git for distributed synchronization
  • JSONL as the interchange format
  • Daemon for automatic orchestration

This layered architecture is the blueprint for building similar systems in other domains.


End of Document

This analysis is intended to inform alternative implementations. When building your own system, carefully evaluate which patterns to adopt based on your specific requirements and constraints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment