Can I build an enterprise-scale software system — say Uber — in under 5 minutes? Spend 4 minutes planning and writing specifications. In the last minute, run a swarm of 5,000 Claude Code instances to code it up.
This sounds absurd until you realize that AI can already generate working code from clear instructions. The bottleneck was never code generation — it was specification. Give 5,000 agents a vague brief and you get chaos. Give them a precise, modular, testable spec and you get a working system.
SDS is the format that makes this possible. It is three things:
- A system manifest — a planning LLM decomposes the full system into bounded modules with explicit contracts.
- Module specs — each module is described in a terse, testable language that fully defines its entities, APIs, logic, and invariants.
- A build pipeline — agent swarms generate code from specs, test it against derived assertions, and deploy.
The human (or planning LLM) does the thinking. The agent swarm does the typing.
┌──────────────────────────────────────────────────────┐
│ PLANNING PHASE │
│ (4 minutes) │
│ │
│ Product Vision / Requirements │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Planning LLM │ (expensive, high-reasoning) │
│ │ │ │
│ │ - Decomposes │ │
│ │ system into │ │
│ │ modules │ │
│ │ - Defines │ │
│ │ ownership │ │
│ │ - Sets │ │
│ │ contracts │ │
│ │ - Writes │ │
│ │ module specs │ │
│ └────────┬────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────┐ │
│ │ System Manifest │ │
│ │ ┌──────┐ ┌──────┐ ┌──────┐ ┌────────┐ │ │
│ │ │Rider │ │Driver│ │Match │ │ ...N │ │ │
│ │ │ Spec │ │ Spec │ │ Spec │ │ Specs │ │ │
│ │ └──────┘ └──────┘ └──────┘ └────────┘ │ │
│ │ + Cross-module contracts │ │
│ │ + System-level invariants │ │
│ └──────────────────┬───────────────────────┘ │
└─────────────────────┼────────────────────────────────┘
│
┌─────────────────────┼────────────────────────────────┐
│ BUILD PHASE │
│ (1 minute) │
│ │ │
│ ┌────────────────┼────────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────┐ ┌──────┐ ┌──────┐ │
│ │Swarm │ │Swarm │ ... │Swarm │ │
│ │ x200 │ │ x200 │ │ x200 │ │
│ │agents│ │agents│ │agents│ │
│ └──┬───┘ └──┬───┘ └──┬───┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────┐ ┌──────┐ ┌──────┐ │
│ │Rider │ │Driver│ │Module│ │
│ │ Code │ │ Code │ │N Code│ │
│ └──────┘ └──────┘ └──────┘ │
│ │ │ │ │
│ └──────────────┼───────────────────┘ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Test & Verify │ │
│ │ - assertions │ │
│ │ - invariants │ │
│ │ - contracts │ │
│ └────────┬────────┘ │
│ │ │
│ ALL GREEN → DEPLOY │
└──────────────────────────────────────────────────────┘
The manifest is the top-level document. It is produced by a planning LLM (or a human architect) and describes the full system as a set of bounded modules with explicit contracts. No module spec can be written until the manifest exists.
system uber v1.0 {
description: "ride-hailing platform"
modules: 7
agents_per_module: ~200
}
Each module declares what it owns, what it exposes, and what it consumes. This is the single most important design decision in the system — it determines what can be built independently.
module rider {
owns: [Rider, RideRequest, Rating]
exposes: [RideRequest.created, RideRequest.cancelled]
consumes: [driver/DriverAssigned, trip/TripCompleted]
boundary: rider-service
}
module driver {
owns: [Driver, Vehicle, Availability, Location]
exposes: [DriverAssigned, DriverLocation]
consumes: [rider/RideRequest.created, trip/TripCompleted]
boundary: driver-service
}
module matching {
owns: [MatchScore, MatchConfig]
exposes: [MatchResult]
consumes: [rider/RideRequest.created, driver/DriverLocation]
boundary: matching-service
~ latency critical < 500ms
}
module trip {
owns: [Trip, Route, Waypoint]
exposes: [TripStarted, TripCompleted, TripCancelled]
consumes: [matching/MatchResult]
boundary: trip-service
}
module pricing {
owns: [Fare, Surge, Promo]
exposes: [FareEstimate, FareFinalized]
consumes: [trip/TripCompleted, driver/DriverLocation]
boundary: pricing-service
}
module payments {
owns: [PaymentMethod, Charge, Payout, Wallet]
exposes: [ChargeCompleted, PayoutSent]
consumes: [pricing/FareFinalized]
boundary: payments-service
}
module notifications {
owns: [Template, Channel, Preference]
consumes: [rider/*, driver/*, trip/*, payments/*]
boundary: notification-service
}
Design rules:
- Single ownership. An entity belongs to exactly one module. No other module can write to it.
- Expose only events and read-only views. Modules communicate through events, not shared databases.
- Consume explicitly. Every dependency is declared. No hidden coupling.
- Boundary = deployment unit. Each module becomes its own service, repo, and agent swarm.
Contracts define the promises between modules — latency, data guarantees, failure behavior. Each contract is testable.
contract rider <-> matching {
RideRequest.created must produce MatchResult within 30s
! no match found => notify rider, retry with expanded radius
}
contract matching <-> driver {
MatchResult references only drivers where Availability = online
DriverAssigned must occur within 10s of MatchResult
}
contract trip <-> pricing {
TripCompleted must produce FareFinalized within 5s
FareFinalized.amount >= Fare.minimum @always
}
contract pricing <-> payments {
FareFinalized must produce exactly one Charge
! Charge.failed => retry 3x, then notify rider + support
}
Rules that no single module can enforce alone. The build pipeline tests these across module boundaries.
system_invariant {
every RideRequest reaches terminal state [completed, cancelled] within 2h
every FareFinalized produces exactly one Charge
Rider.rating = avg(Rating where rider=R) @eventually_consistent
no Charge without corresponding FareFinalized
Driver.balance = sum(Payout where driver=D) - sum(Charge.platform_fee)
}
The planning LLM is the most critical step. It must:
- Decompose correctly. Bad module boundaries create tight coupling that defeats the entire paradigm. The planning LLM must understand domain-driven design, bounded contexts, and data ownership.
- Minimize cross-module contracts. Every contract is a coordination cost. Fewer contracts = more parallelism = faster build.
- Right-size modules. Too big and a single agent swarm can't build it in time. Too small and contract overhead dominates.
- Anticipate evolution. Modules will change independently. The boundaries must accommodate growth without restructuring.
This is why it should be the most capable (and expensive) model available. The planning phase is where intelligence matters most. The coding phase is brute-force parallelism.
Each module gets its own spec file, written in the SDS language. This is what the agent swarm reads to generate code.
entity Product {
id: uuid @immutable
name: string 1..200
price: decimal 0.01..999999.99
stock: int 0..*
status: [draft, active, archived]
created: timestamp @immutable @auto
updated: timestamp @auto
}
Types: uuid, string, int, decimal, bool, timestamp, {} (object)
Ranges: 0..150 (inclusive), 0..* (unbounded), 1..200 (length or value)
Enums: [draft, active, archived] — closed set.
Optional: Suffix with ? — e.g. detail: string?
References: product: -> Product — foreign key to another entity.
Behavioral metadata on fields or entities.
| Decorator | Meaning |
|---|---|
@immutable |
Cannot change after creation |
@unique |
Must be unique across all records |
@auto |
System-generated |
@computed |
Derived from other fields/invariants |
@snapshot(Source.field) |
Captures value at point in time |
Hints the AI should optimize for. Not hard failures.
~ latency < 200ms
~ cache: 60s
~ paginate: 20
~ abandon after 72h inactivity
Hard conditions. Each becomes an automated test.
! email exists => 409
! qty > product.stock => stock_conflict
! not found => 404
Pattern: ! <condition> => <error_code | error_ref>; <optional rollback>
edge Cart -> CartItem {1:many}
edge Order -> Payment {1:1}
Cardinalities: {1:1}, {1:many}, {many:many}
machine Order.status {
pending -> paid -> shipped -> delivered
pending -> cancelled
paid -> cancelled -> refunded
}
Any transition not listed is illegal. Both transition logic and rejection tests are generated.
First-class, reusable, testable.
error stock_conflict: 409 {
message: "insufficient stock"
detail: "{product.name}: requested {qty}, available {stock}"
}
Side effects as part of the contract.
event order.paid {
ref: Order.id
payload: {total, user, method}
>> notify(user, email)
}
Reusable compositions of steps.
action reserve_stock(item) {
>> decrement item.product.stock by item.qty
! stock < 0 => stock_conflict; rollback
~ emit stock.low if threshold
}
api POST /checkout {
in: {method: Payment.method}
out: Order
! cart empty => cart_empty
! any item.qty > item.product.stock => stock_conflict
>> snapshot prices to OrderLines
>> reserve_stock(each item)
>> create Payment(pending)
>> create Order(pending)
>> charge payment
! payment fails => payment_failed; rollback stock
>> set Payment(captured), Order(paid)
>> set Cart(checked_out)
>> emit order.paid
}
System-wide rules. Become property tests.
invariant {
Order.total = sum(OrderLine.qty * OrderLine.unit_price)
stock >= 0 @always
Cart per user: max 1 where status=open
}
auth {
/products/** : public(GET), admin(POST, PATCH, DELETE)
/cart/** : owner
/orders/** : owner
}
limit {
POST /checkout : 5/min per user
GET /products : 100/min per ip
}
observe {
log: all api calls, all errors, all state transitions
metric: api.latency, checkout.conversion, stock.level
trace: checkout flow end-to-end
}
runtime {
lang: python | go | node
db: postgres
cache: redis
queue: rabbitmq
auth: jwt
deploy: container
}
Code is ephemeral. Schema is migrated. Data is sacred.
migrate {
mode: safe -- never drops columns with data
rename: explicit -- renames must be declared
backfill: required -- new non-nullable fields need a default
}
evolve Product {
+category: string default "general" -- new field, backfilled
~price: decimal 0.01..9999999.99 -- widened range, safe
-legacy_sku -- drop only if empty, else fail
}
Symbols: + add, ~ alter, - remove.
The pipeline refuses to deploy if a schema diff exists with no corresponding evolve block.
| Layer | Lifecycle |
|---|---|
| Code | Ephemeral. Regenerated every deploy. |
| Schema | Derived from spec. Migrated via diffs and evolve blocks. |
| Data | Sacred. Never touched except through declared backfills. |
1. PARSE Validate spec syntax, resolve imports.
2. DIFF Compare spec against live schema. Require evolve blocks for changes.
3. FAN OUT Distribute module specs to agent swarms (parallel).
4. GENERATE Each swarm generates code: handlers, queries, migrations, infra.
5. TEST Derive tests from !, invariants, machines, contracts. Run all.
6. VERIFY Cross-module contract verification.
7. DEPLOY Green = deploy. Red = agents retry with error context.
| Agent | Reads | Generates |
|---|---|---|
| DB Agent | Entities, edges, evolve blocks | Migrations, queries, schema |
| API Agent | APIs, actions, auth, limits | Route handlers, middleware, validation |
| Event Agent | Events, consume declarations | Pub/sub wiring, handlers, notifications |
| Test Agent | ! assertions, invariants, machines |
Unit, integration, and property tests |
| Deploy Agent | Runtime block | Dockerfiles, infra-as-code, CI config |
Agents don't communicate directly. The spec is the shared contract. Entity definitions become the agreed-upon table names, field types, and query interfaces. All agents read the same spec and generate compatible code by construction.
on test failure {
>> feed error context + failing test + spec back to generating agent
>> agent regenerates affected code
>> re-run tests
~ max retries: 3
! still failing after retries => halt deploy, alert human
}
System Manifest (1 file — the planning LLM's output)
├── Module definitions (boundaries, ownership)
├── Cross-module contracts (promises between modules)
├── System invariants (global rules)
│
└── Module Specs (1 per module — agent swarm input)
├── Entities, edges, machines
├── APIs, actions, errors, events
├── Module-level invariants
├── Auth, limits, observability
├── Runtime config
└── Evolve blocks
A system like Uber might decompose into ~25 modules. Each module spec is ~200-400 lines. The system manifest is ~200 lines. Total specification: ~7,000-10,000 lines.
The equivalent codebase today: millions of lines across hundreds of repos.
At build time, 5,000 agents distributed across 25 modules (~200 per module) generate the full implementation in parallel. Each agent handles a slice — one generates the database layer, another the API routes, another the tests, another the Dockerfile. They don't coordinate with each other. They all read the same spec.
| Phase | Time | Who |
|---|---|---|
| Requirements → Manifest | ~2 min | Planning LLM (expensive, high-reasoning) |
| Manifest → Module Specs | ~2 min | Planning LLM or spec-writing agents |
| Module Specs → Code | ~30 sec | 5,000 agent swarm (parallel) |
| Test & Verify | ~20 sec | Test agents (parallel per module) |
| Deploy | ~10 sec | Deploy agents (parallel per module) |
The thinking is slow and expensive. The coding is fast and cheap. This is the correct allocation of intelligence.
-- =============================================
-- SYSTEM MANIFEST
-- =============================================
system simpleshop v0.1 {
description: "minimal e-commerce system"
modules: 1
}
module shop {
owns: [Product, Cart, CartItem, Order, OrderLine, Payment]
exposes: [order.paid, order.shipped, order.cancelled, stock.low]
boundary: shop-service
}
-- =============================================
-- MODULE SPEC: shop
-- =============================================
spec shop v0.1 {
description: "product catalog, cart, checkout, orders"
agents: [db, api, events, tests, deploy]
}
runtime {
lang: node
db: postgres
cache: redis
queue: rabbitmq
auth: jwt
deploy: container
}
-- ENTITIES --
entity Product {
id: uuid @immutable
name: string 1..200
price: decimal 0.01..999999.99
stock: int 0..*
status: [draft, active, archived]
created: timestamp @immutable @auto
updated: timestamp @auto
}
entity Cart {
id: uuid @immutable
user: uuid
status: [open, checked_out, abandoned]
created: timestamp @immutable @auto
~ abandon after 72h inactivity
}
entity CartItem {
id: uuid @immutable
product: -> Product
qty: int 1..100
}
entity Order {
id: uuid @immutable
user: uuid
total: decimal @computed
status: [pending, paid, shipped, delivered, cancelled, refunded]
created: timestamp @immutable @auto
}
entity OrderLine {
id: uuid @immutable
product: -> Product
qty: int 1..*
unit_price: decimal @snapshot(Product.price)
}
entity Payment {
id: uuid @immutable
amount: decimal 0.01..*
method: [card, wallet]
status: [pending, captured, failed, refunded]
created: timestamp @immutable @auto
}
-- EDGES --
edge Cart -> CartItem {1:many}
edge Order -> OrderLine {1:many}
edge Order -> Payment {1:1}
-- STATE MACHINES --
machine Order.status {
pending -> paid -> shipped -> delivered
pending -> cancelled
paid -> cancelled -> refunded
}
machine Payment.status {
pending -> captured
pending -> failed
captured -> refunded
}
-- ERRORS --
error stock_conflict: 409 {
message: "insufficient stock"
detail: "{product.name}: requested {qty}, available {stock}"
}
error cart_empty: 400 {
message: "cart is empty"
}
error payment_failed: 402 {
message: "payment could not be captured"
detail: "{provider.reason}"
}
error not_owner: 403 {
message: "access denied"
}
-- EVENTS --
event order.paid {
ref: Order.id
payload: {total, user, method}
>> notify(user, email)
}
event order.shipped {
ref: Order.id
payload: {tracking}
>> notify(user, email + sms)
}
event order.cancelled {
ref: Order.id
payload: {reason, refunded: bool}
>> notify(user, email)
>> if refunded: notify(finance, webhook)
}
event stock.low {
ref: Product.id
~ trigger when stock < 10
>> notify(admin, email)
}
-- ACTIONS --
action reserve_stock(item) {
>> decrement item.product.stock by item.qty
! stock < 0 => stock_conflict; rollback
~ emit stock.low if threshold
}
-- APIs --
api GET /products {
out: [Product] ?status=active
~ paginate: 20
~ cache: 30s
}
api GET /products/{id} {
out: Product
! not found => 404
}
api POST /cart/items {
in: {product: uuid, qty: int}
out: Cart
! product.status != active => 400 "product unavailable"
! qty > product.stock => stock_conflict
! cart.status != open => 400 "cart not open"
}
api DELETE /cart/items/{id} {
! not found => 404
! cart.status != open => 400
}
api PATCH /cart/items/{id} {
in: {qty: int}
! qty > product.stock => stock_conflict
! qty < 1 => 400
}
api POST /checkout {
in: {method: Payment.method}
out: Order
! cart empty => cart_empty
! any item.qty > item.product.stock => stock_conflict
>> snapshot prices to OrderLines
>> reserve_stock(each item)
>> create Payment(pending)
>> create Order(pending)
>> charge payment
! payment fails => payment_failed; rollback stock
>> set Payment(captured), Order(paid)
>> set Cart(checked_out)
>> emit order.paid
}
api GET /orders {
out: [Order] @owner
~ paginate: 10
}
api GET /orders/{id} {
out: Order + [OrderLine] + Payment
! not owner => not_owner
! not found => 404
}
api POST /orders/{id}/cancel {
! status not in [pending, paid] => 400 "cannot cancel"
>> if paid: refund payment
>> restore stock per line
>> set Order(cancelled)
>> emit order.cancelled
}
-- INVARIANTS --
invariant {
Order.total = sum(OrderLine.qty * OrderLine.unit_price)
OrderLine.unit_price = Product.price @at_checkout
stock >= 0 @always
Cart per user: max 1 where status=open
Payment.refund => Order.status:cancelled
Order.status:cancelled => stock.restore
Order.status:paid => event order.paid emitted
}
-- AUTH --
auth {
/products/** : public(GET), admin(POST, PATCH, DELETE)
/cart/** : owner
/orders/** : owner
/orders/{id}/cancel : owner
}
-- LIMITS --
limit {
POST /cart/items : 30/min per user
POST /checkout : 5/min per user
GET /products : 100/min per ip
}
-- OBSERVABILITY --
observe {
log: all api calls, all errors, all state transitions
metric: api.latency, checkout.conversion, stock.level
trace: checkout flow end-to-end
}
-- MIGRATION --
migrate {
mode: safe
rename: explicit
backfill: required
}
-- SEED --
seed {
User {email: "admin@shop.com", role: admin}
Product.status default: draft
}
- Formal grammar. Should SDS have a BNF/PEG grammar for deterministic parsing?
- Agent protocol. How do agents report partial failures? What's the retry contract beyond max 3?
- Escape hatches. When the spec can't express something, how do you drop to raw code?
- Determinism. Two builds from the same spec may produce different code. Is that acceptable if tests pass?
- Planning LLM feedback loop. If agents consistently fail on a module, should the planning LLM restructure the manifest?
- Spec testing. Can you test the spec itself for internal consistency before any code is generated?
- Cost model. What's the token cost of a full system build? How does it compare to engineer-months?
| What | Role |
|---|---|
| Planning LLM | The architect. Decomposes, modularizes, sets contracts. |
| System Manifest | The constitution. Defines modules, boundaries, promises. |
| Module Specs | The laws. Define entities, APIs, logic, invariants per module. |
| Agent Swarm | The labor. Generates code from specs in parallel. |
| Test Pipeline | The judge. Verifies code against spec-derived tests. |
| Code | The artifact. Ephemeral, regenerated, never read by humans. |
SDS — because the best code is code you never have to read.