Spec-Driven Software (SDS)

A paradigm where specifications are the product and code is a build artifact.

The Thought Experiment

Can I build an enterprise-scale software system — say Uber — in under 5 minutes? Spend 4 minutes planning and writing specifications. In the last minute, run a swarm of 5,000 Claude Code instances to code it up.

This sounds absurd until you realize that AI can already generate working code from clear instructions. The bottleneck was never code generation — it was specification. Give 5,000 agents a vague brief and you get chaos. Give them a precise, modular, testable spec and you get a working system.

SDS is the format that makes this possible. It is three things:

A system manifest — a planning LLM decomposes the full system into bounded modules with explicit contracts.
Module specs — each module is described in a terse, testable language that fully defines its entities, APIs, logic, and invariants.
A build pipeline — agent swarms generate code from specs, test it against derived assertions, and deploy.

The human (or planning LLM) does the thinking. The agent swarm does the typing.

Architecture Overview

┌──────────────────────────────────────────────────────┐
│                   PLANNING PHASE                     │
│                   (4 minutes)                        │
│                                                      │
│  Product Vision / Requirements                       │
│         │                                            │
│         ▼                                            │
│  ┌─────────────────┐                                 │
│  │  Planning LLM   │  (expensive, high-reasoning)    │
│  │                  │                                 │
│  │  - Decomposes    │                                 │
│  │    system into   │                                 │
│  │    modules       │                                 │
│  │  - Defines       │                                 │
│  │    ownership     │                                 │
│  │  - Sets          │                                 │
│  │    contracts     │                                 │
│  │  - Writes        │                                 │
│  │    module specs  │                                 │
│  └────────┬────────┘                                 │
│           │                                          │
│           ▼                                          │
│  ┌──────────────────────────────────────────┐        │
│  │         System Manifest                  │        │
│  │  ┌──────┐ ┌──────┐ ┌──────┐ ┌────────┐  │        │
│  │  │Rider │ │Driver│ │Match │ │  ...N  │  │        │
│  │  │ Spec │ │ Spec │ │ Spec │ │  Specs │  │        │
│  │  └──────┘ └──────┘ └──────┘ └────────┘  │        │
│  │  + Cross-module contracts                │        │
│  │  + System-level invariants               │        │
│  └──────────────────┬───────────────────────┘        │
└─────────────────────┼────────────────────────────────┘
                      │
┌─────────────────────┼────────────────────────────────┐
│                BUILD PHASE                           │
│                (1 minute)                            │
│                     │                                │
│    ┌────────────────┼────────────────────┐           │
│    │                │                    │           │
│    ▼                ▼                    ▼           │
│ ┌──────┐       ┌──────┐            ┌──────┐         │
│ │Swarm │       │Swarm │    ...     │Swarm │         │
│ │ x200 │       │ x200 │            │ x200 │         │
│ │agents│       │agents│            │agents│         │
│ └──┬───┘       └──┬───┘            └──┬───┘         │
│    │              │                   │              │
│    ▼              ▼                   ▼              │
│ ┌──────┐       ┌──────┐            ┌──────┐         │
│ │Rider │       │Driver│            │Module│         │
│ │ Code │       │ Code │            │N Code│         │
│ └──────┘       └──────┘            └──────┘         │
│    │              │                   │              │
│    └──────────────┼───────────────────┘              │
│                   ▼                                  │
│          ┌─────────────────┐                         │
│          │   Test & Verify │                         │
│          │   - assertions  │                         │
│          │   - invariants  │                         │
│          │   - contracts   │                         │
│          └────────┬────────┘                         │
│                   │                                  │
│              ALL GREEN → DEPLOY                      │
└──────────────────────────────────────────────────────┘

Part I — System Manifest

The manifest is the top-level document. It is produced by a planning LLM (or a human architect) and describes the full system as a set of bounded modules with explicit contracts. No module spec can be written until the manifest exists.

1.1 System Declaration

system uber v1.0 {
  description: "ride-hailing platform"
  modules: 7
  agents_per_module: ~200
}

1.2 Module Definitions

Each module declares what it owns, what it exposes, and what it consumes. This is the single most important design decision in the system — it determines what can be built independently.

module rider {
  owns: [Rider, RideRequest, Rating]
  exposes: [RideRequest.created, RideRequest.cancelled]
  consumes: [driver/DriverAssigned, trip/TripCompleted]
  boundary: rider-service
}

module driver {
  owns: [Driver, Vehicle, Availability, Location]
  exposes: [DriverAssigned, DriverLocation]
  consumes: [rider/RideRequest.created, trip/TripCompleted]
  boundary: driver-service
}

module matching {
  owns: [MatchScore, MatchConfig]
  exposes: [MatchResult]
  consumes: [rider/RideRequest.created, driver/DriverLocation]
  boundary: matching-service
  ~ latency critical < 500ms
}

module trip {
  owns: [Trip, Route, Waypoint]
  exposes: [TripStarted, TripCompleted, TripCancelled]
  consumes: [matching/MatchResult]
  boundary: trip-service
}

module pricing {
  owns: [Fare, Surge, Promo]
  exposes: [FareEstimate, FareFinalized]
  consumes: [trip/TripCompleted, driver/DriverLocation]
  boundary: pricing-service
}

module payments {
  owns: [PaymentMethod, Charge, Payout, Wallet]
  exposes: [ChargeCompleted, PayoutSent]
  consumes: [pricing/FareFinalized]
  boundary: payments-service
}

module notifications {
  owns: [Template, Channel, Preference]
  consumes: [rider/*, driver/*, trip/*, payments/*]
  boundary: notification-service
}

Design rules:

Single ownership. An entity belongs to exactly one module. No other module can write to it.
Expose only events and read-only views. Modules communicate through events, not shared databases.
Consume explicitly. Every dependency is declared. No hidden coupling.
Boundary = deployment unit. Each module becomes its own service, repo, and agent swarm.

1.3 Cross-Module Contracts

Contracts define the promises between modules — latency, data guarantees, failure behavior. Each contract is testable.

contract rider <-> matching {
  RideRequest.created must produce MatchResult within 30s
  ! no match found => notify rider, retry with expanded radius
}

contract matching <-> driver {
  MatchResult references only drivers where Availability = online
  DriverAssigned must occur within 10s of MatchResult
}

contract trip <-> pricing {
  TripCompleted must produce FareFinalized within 5s
  FareFinalized.amount >= Fare.minimum @always
}

contract pricing <-> payments {
  FareFinalized must produce exactly one Charge
  ! Charge.failed => retry 3x, then notify rider + support
}

1.4 System-Level Invariants

Rules that no single module can enforce alone. The build pipeline tests these across module boundaries.

system_invariant {
  every RideRequest reaches terminal state [completed, cancelled] within 2h
  every FareFinalized produces exactly one Charge
  Rider.rating = avg(Rating where rider=R) @eventually_consistent
  no Charge without corresponding FareFinalized
  Driver.balance = sum(Payout where driver=D) - sum(Charge.platform_fee)
}

1.5 Why the Planning LLM Matters

The planning LLM is the most critical step. It must:

Decompose correctly. Bad module boundaries create tight coupling that defeats the entire paradigm. The planning LLM must understand domain-driven design, bounded contexts, and data ownership.
Minimize cross-module contracts. Every contract is a coordination cost. Fewer contracts = more parallelism = faster build.
Right-size modules. Too big and a single agent swarm can't build it in time. Too small and contract overhead dominates.
Anticipate evolution. Modules will change independently. The boundaries must accommodate growth without restructuring.

This is why it should be the most capable (and expensive) model available. The planning phase is where intelligence matters most. The coding phase is brute-force parallelism.

Part II — Module Spec Language

Each module gets its own spec file, written in the SDS language. This is what the agent swarm reads to generate code.

2.1 Entities

entity Product {
  id: uuid @immutable
  name: string 1..200
  price: decimal 0.01..999999.99
  stock: int 0..*
  status: [draft, active, archived]
  created: timestamp @immutable @auto
  updated: timestamp @auto
}

Types: uuid, string, int, decimal, bool, timestamp, {} (object)

Ranges: 0..150 (inclusive), 0..* (unbounded), 1..200 (length or value)

Enums: [draft, active, archived] — closed set.

Optional: Suffix with ? — e.g. detail: string?

References: product: -> Product — foreign key to another entity.

2.2 Decorators (`@`)

Behavioral metadata on fields or entities.

Decorator	Meaning
`@immutable`	Cannot change after creation
`@unique`	Must be unique across all records
`@auto`	System-generated
`@computed`	Derived from other fields/invariants
`@snapshot(Source.field)`	Captures value at point in time

2.3 Soft Constraints (`~`)

Hints the AI should optimize for. Not hard failures.

~ latency < 200ms
~ cache: 60s
~ paginate: 20
~ abandon after 72h inactivity

2.4 Assertions (`!`)

Hard conditions. Each becomes an automated test.

! email exists => 409
! qty > product.stock => stock_conflict
! not found => 404

Pattern: ! <condition> => <error_code | error_ref>; <optional rollback>

2.5 Edges

edge Cart -> CartItem {1:many}
edge Order -> Payment {1:1}

Cardinalities: {1:1}, {1:many}, {many:many}

2.6 State Machines

machine Order.status {
  pending -> paid -> shipped -> delivered
  pending -> cancelled
  paid -> cancelled -> refunded
}

Any transition not listed is illegal. Both transition logic and rejection tests are generated.

2.7 Errors

First-class, reusable, testable.

error stock_conflict: 409 {
  message: "insufficient stock"
  detail: "{product.name}: requested {qty}, available {stock}"
}

2.8 Events

Side effects as part of the contract.

event order.paid {
  ref: Order.id
  payload: {total, user, method}
  >> notify(user, email)
}

2.9 Actions

Reusable compositions of steps.

action reserve_stock(item) {
  >> decrement item.product.stock by item.qty
  ! stock < 0 => stock_conflict; rollback
  ~ emit stock.low if threshold
}

2.10 APIs

api POST /checkout {
  in: {method: Payment.method}
  out: Order
  ! cart empty => cart_empty
  ! any item.qty > item.product.stock => stock_conflict
  >> snapshot prices to OrderLines
  >> reserve_stock(each item)
  >> create Payment(pending)
  >> create Order(pending)
  >> charge payment
  ! payment fails => payment_failed; rollback stock
  >> set Payment(captured), Order(paid)
  >> set Cart(checked_out)
  >> emit order.paid
}

2.11 Invariants

System-wide rules. Become property tests.

invariant {
  Order.total = sum(OrderLine.qty * OrderLine.unit_price)
  stock >= 0 @always
  Cart per user: max 1 where status=open
}

2.12 Auth

auth {
  /products/** : public(GET), admin(POST, PATCH, DELETE)
  /cart/** : owner
  /orders/** : owner
}

2.13 Rate Limits

limit {
  POST /checkout : 5/min per user
  GET /products : 100/min per ip
}

2.14 Observability

observe {
  log: all api calls, all errors, all state transitions
  metric: api.latency, checkout.conversion, stock.level
  trace: checkout flow end-to-end
}

2.15 Runtime

runtime {
  lang: python | go | node
  db: postgres
  cache: redis
  queue: rabbitmq
  auth: jwt
  deploy: container
}

Part III — Schema Evolution

Code is ephemeral. Schema is migrated. Data is sacred.

3.1 Migration Policy

migrate {
  mode: safe            -- never drops columns with data
  rename: explicit      -- renames must be declared
  backfill: required    -- new non-nullable fields need a default
}

3.2 Evolve Blocks

evolve Product {
  +category: string default "general"     -- new field, backfilled
  ~price: decimal 0.01..9999999.99        -- widened range, safe
  -legacy_sku                             -- drop only if empty, else fail
}

Symbols: + add, ~ alter, - remove.

The pipeline refuses to deploy if a schema diff exists with no corresponding evolve block.

3.3 The Rule

Layer	Lifecycle
Code	Ephemeral. Regenerated every deploy.
Schema	Derived from spec. Migrated via diffs and evolve blocks.
Data	Sacred. Never touched except through declared backfills.

Part IV — Build Pipeline

4.1 Pipeline Stages

1. PARSE          Validate spec syntax, resolve imports.
2. DIFF           Compare spec against live schema. Require evolve blocks for changes.
3. FAN OUT        Distribute module specs to agent swarms (parallel).
4. GENERATE       Each swarm generates code: handlers, queries, migrations, infra.
5. TEST           Derive tests from !, invariants, machines, contracts. Run all.
6. VERIFY         Cross-module contract verification.
7. DEPLOY         Green = deploy. Red = agents retry with error context.

4.2 Agent Roles Within a Swarm

Agent	Reads	Generates
DB Agent	Entities, edges, evolve blocks	Migrations, queries, schema
API Agent	APIs, actions, auth, limits	Route handlers, middleware, validation
Event Agent	Events, consume declarations	Pub/sub wiring, handlers, notifications
Test Agent	`!` assertions, invariants, machines	Unit, integration, and property tests
Deploy Agent	Runtime block	Dockerfiles, infra-as-code, CI config

4.3 Inter-Agent Contract

Agents don't communicate directly. The spec is the shared contract. Entity definitions become the agreed-upon table names, field types, and query interfaces. All agents read the same spec and generate compatible code by construction.

4.4 Failure and Retry

on test failure {
  >> feed error context + failing test + spec back to generating agent
  >> agent regenerates affected code
  >> re-run tests
  ~ max retries: 3
  ! still failing after retries => halt deploy, alert human
}

Part V — Composition at Scale

5.1 The Full Hierarchy

System Manifest              (1 file — the planning LLM's output)
  ├── Module definitions     (boundaries, ownership)
  ├── Cross-module contracts (promises between modules)
  ├── System invariants      (global rules)
  │
  └── Module Specs           (1 per module — agent swarm input)
        ├── Entities, edges, machines
        ├── APIs, actions, errors, events
        ├── Module-level invariants
        ├── Auth, limits, observability
        ├── Runtime config
        └── Evolve blocks

5.2 Scale Math

A system like Uber might decompose into ~25 modules. Each module spec is ~200-400 lines. The system manifest is ~200 lines. Total specification: ~7,000-10,000 lines.

The equivalent codebase today: millions of lines across hundreds of repos.

At build time, 5,000 agents distributed across 25 modules (~200 per module) generate the full implementation in parallel. Each agent handles a slice — one generates the database layer, another the API routes, another the tests, another the Dockerfile. They don't coordinate with each other. They all read the same spec.

5.3 The 5-Minute Build

Phase	Time	Who
Requirements → Manifest	~2 min	Planning LLM (expensive, high-reasoning)
Manifest → Module Specs	~2 min	Planning LLM or spec-writing agents
Module Specs → Code	~30 sec	5,000 agent swarm (parallel)
Test & Verify	~20 sec	Test agents (parallel per module)
Deploy	~10 sec	Deploy agents (parallel per module)

The thinking is slow and expensive. The coding is fast and cheap. This is the correct allocation of intelligence.

Part VI — Full Example: SimpleShop

-- =============================================
-- SYSTEM MANIFEST
-- =============================================

system simpleshop v0.1 {
  description: "minimal e-commerce system"
  modules: 1
}

module shop {
  owns: [Product, Cart, CartItem, Order, OrderLine, Payment]
  exposes: [order.paid, order.shipped, order.cancelled, stock.low]
  boundary: shop-service
}

-- =============================================
-- MODULE SPEC: shop
-- =============================================

spec shop v0.1 {
  description: "product catalog, cart, checkout, orders"
  agents: [db, api, events, tests, deploy]
}

runtime {
  lang: node
  db: postgres
  cache: redis
  queue: rabbitmq
  auth: jwt
  deploy: container
}

-- ENTITIES --

entity Product {
  id: uuid @immutable
  name: string 1..200
  price: decimal 0.01..999999.99
  stock: int 0..*
  status: [draft, active, archived]
  created: timestamp @immutable @auto
  updated: timestamp @auto
}

entity Cart {
  id: uuid @immutable
  user: uuid
  status: [open, checked_out, abandoned]
  created: timestamp @immutable @auto
  ~ abandon after 72h inactivity
}

entity CartItem {
  id: uuid @immutable
  product: -> Product
  qty: int 1..100
}

entity Order {
  id: uuid @immutable
  user: uuid
  total: decimal @computed
  status: [pending, paid, shipped, delivered, cancelled, refunded]
  created: timestamp @immutable @auto
}

entity OrderLine {
  id: uuid @immutable
  product: -> Product
  qty: int 1..*
  unit_price: decimal @snapshot(Product.price)
}

entity Payment {
  id: uuid @immutable
  amount: decimal 0.01..*
  method: [card, wallet]
  status: [pending, captured, failed, refunded]
  created: timestamp @immutable @auto
}

-- EDGES --

edge Cart -> CartItem {1:many}
edge Order -> OrderLine {1:many}
edge Order -> Payment {1:1}

-- STATE MACHINES --

machine Order.status {
  pending -> paid -> shipped -> delivered
  pending -> cancelled
  paid -> cancelled -> refunded
}

machine Payment.status {
  pending -> captured
  pending -> failed
  captured -> refunded
}

-- ERRORS --

error stock_conflict: 409 {
  message: "insufficient stock"
  detail: "{product.name}: requested {qty}, available {stock}"
}

error cart_empty: 400 {
  message: "cart is empty"
}

error payment_failed: 402 {
  message: "payment could not be captured"
  detail: "{provider.reason}"
}

error not_owner: 403 {
  message: "access denied"
}

-- EVENTS --

event order.paid {
  ref: Order.id
  payload: {total, user, method}
  >> notify(user, email)
}

event order.shipped {
  ref: Order.id
  payload: {tracking}
  >> notify(user, email + sms)
}

event order.cancelled {
  ref: Order.id
  payload: {reason, refunded: bool}
  >> notify(user, email)
  >> if refunded: notify(finance, webhook)
}

event stock.low {
  ref: Product.id
  ~ trigger when stock < 10
  >> notify(admin, email)
}

-- ACTIONS --

action reserve_stock(item) {
  >> decrement item.product.stock by item.qty
  ! stock < 0 => stock_conflict; rollback
  ~ emit stock.low if threshold
}

-- APIs --

api GET /products {
  out: [Product] ?status=active
  ~ paginate: 20
  ~ cache: 30s
}

api GET /products/{id} {
  out: Product
  ! not found => 404
}

api POST /cart/items {
  in: {product: uuid, qty: int}
  out: Cart
  ! product.status != active => 400 "product unavailable"
  ! qty > product.stock => stock_conflict
  ! cart.status != open => 400 "cart not open"
}

api DELETE /cart/items/{id} {
  ! not found => 404
  ! cart.status != open => 400
}

api PATCH /cart/items/{id} {
  in: {qty: int}
  ! qty > product.stock => stock_conflict
  ! qty < 1 => 400
}

api POST /checkout {
  in: {method: Payment.method}
  out: Order
  ! cart empty => cart_empty
  ! any item.qty > item.product.stock => stock_conflict
  >> snapshot prices to OrderLines
  >> reserve_stock(each item)
  >> create Payment(pending)
  >> create Order(pending)
  >> charge payment
  ! payment fails => payment_failed; rollback stock
  >> set Payment(captured), Order(paid)
  >> set Cart(checked_out)
  >> emit order.paid
}

api GET /orders {
  out: [Order] @owner
  ~ paginate: 10
}

api GET /orders/{id} {
  out: Order + [OrderLine] + Payment
  ! not owner => not_owner
  ! not found => 404
}

api POST /orders/{id}/cancel {
  ! status not in [pending, paid] => 400 "cannot cancel"
  >> if paid: refund payment
  >> restore stock per line
  >> set Order(cancelled)
  >> emit order.cancelled
}

-- INVARIANTS --

invariant {
  Order.total = sum(OrderLine.qty * OrderLine.unit_price)
  OrderLine.unit_price = Product.price @at_checkout
  stock >= 0 @always
  Cart per user: max 1 where status=open
  Payment.refund => Order.status:cancelled
  Order.status:cancelled => stock.restore
  Order.status:paid => event order.paid emitted
}

-- AUTH --

auth {
  /products/** : public(GET), admin(POST, PATCH, DELETE)
  /cart/** : owner
  /orders/** : owner
  /orders/{id}/cancel : owner
}

-- LIMITS --

limit {
  POST /cart/items : 30/min per user
  POST /checkout : 5/min per user
  GET /products : 100/min per ip
}

-- OBSERVABILITY --

observe {
  log: all api calls, all errors, all state transitions
  metric: api.latency, checkout.conversion, stock.level
  trace: checkout flow end-to-end
}

-- MIGRATION --

migrate {
  mode: safe
  rename: explicit
  backfill: required
}

-- SEED --

seed {
  User {email: "admin@shop.com", role: admin}
  Product.status default: draft
}

Part VII — Open Questions

Formal grammar. Should SDS have a BNF/PEG grammar for deterministic parsing?
Agent protocol. How do agents report partial failures? What's the retry contract beyond max 3?
Escape hatches. When the spec can't express something, how do you drop to raw code?
Determinism. Two builds from the same spec may produce different code. Is that acceptable if tests pass?
Planning LLM feedback loop. If agents consistently fail on a module, should the planning LLM restructure the manifest?
Spec testing. Can you test the spec itself for internal consistency before any code is generated?
Cost model. What's the token cost of a full system build? How does it compare to engineer-months?

Summary

What	Role
Planning LLM	The architect. Decomposes, modularizes, sets contracts.
System Manifest	The constitution. Defines modules, boundaries, promises.
Module Specs	The laws. Define entities, APIs, logic, invariants per module.
Agent Swarm	The labor. Generates code from specs in parallel.
Test Pipeline	The judge. Verifies code against spec-derived tests.
Code	The artifact. Ephemeral, regenerated, never read by humans.

SDS — because the best code is code you never have to read.

arvindrajnaidu/SDS.md