自然語言 Metaprogramming

Coding agent 讓開發者做的事情本質上是 metaprogramming in natural language——用自然語言寫「產生程式碼的程式」。

核心概念

傳統 metaprogramming 的類比

傳統 metaprogramming	NL metaprogramming
Macro / code generator	Prompt
AST	Codebase（agent 能讀寫的所有檔案）
型別系統保證展開正確	Deterministic 工具驗證 agent 產出
單一語言（Rust macro 只能產 Rust）	跨語言——同一個 prompt 能產 TS、OCaml、SQL
確定性（同 input 同 output）	非確定性——所以需要 verification pipeline
編譯期執行	開發期執行，可平行化

100x 的來源：agent 操作 deterministic 工具

Agent 直接寫 code = 非確定性產出，需要大量 review。 Agent 操作 deterministic 工具 = 確定性產出，100x throughput，review 成本趨近零。

flowchart LR
    NL["NL prompt<br>（非確定性）"]
    Params["工具呼叫參數<br>（確定性 boundary）"]
    Exec["工具執行<br>（確定性）"]
    Output["確定性產出"]

    NL -->|"agent 的 non-determinism<br>被壓縮到這一步"| Params --> Exec --> Output

Agent 的不確定性只存在於「決定呼叫什麼」這一步。一旦決定了，後面全是確定性的。而「呼叫對了嗎」這件事，工具自己就能驗證（type error、lint error、test failure）。

[!important] 關鍵洞察你的 bottleneck 不是 agent 的能力，而是你手上有多少 deterministic 工具可以讓 agent 操作。每多一個 deterministic 工具，100x 就擴展到一個新的面向。

人類與 agent 的分工

人類做深度、聚焦的事——定義什麼是正確的。 Agent 做廣度、平行的事——用 deterministic 工具大規模 enforce。

flowchart LR
    subgraph 人類["🧠 人類（深度、聚焦）"]
        H1["定義 invariant"]
        H2["選擇型別架構"]
        H3["審核 property"]
        H4["判斷 mutation score 目標"]
    end
    subgraph Agent["🤖 Agent（廣度、平行）"]
        A1["掃描 codebase 找違反"]
        A2["每個 boundary 加 validation"]
        A3["per function 產生 PBT"]
        A4["per module 跑 mutation testing"]
    end

    H1 --> A1
    H2 --> A2
    H3 --> A3
    H4 --> A4

Constraint 即 Source Code

自然語言就是最好的 constraint DSL

傳統 DSL 存在的理由是 compiler 需要結構化 input 才能 parse。但你的 compiler 是 LLM——它已經能讀自然語言。加 formal DSL 反而限制表達力。

你需要的是 structured natural language——有格式但不需要 grammar。用 Markdown + frontmatter 即可：

# constraints/no-float-money.md
---
scope: src/billing/**, src/payment/**
enforce: ast-transform, pbt
---

所有處理金額的變數和函式參數必須使用 Decimal 型別，
不可使用 number / float。
乘除運算必須指定 rounding mode。

# constraints/api-roundtrip.md
---
scope: src/api/**
enforce: pbt
---

所有 public API 的 request/response type 必須滿足
parse(serialize(x)) === x。

Constraint compiler

Agent 讀取 constraint，判斷用哪個 deterministic 工具 enforce，產出永久性的 artifact：

flowchart TD
    C["人類寫 constraint<br>（自然語言 .md）"]
    Agent{"Coding agent<br>判斷該用什麼工具"}
    AST["AST transform<br>掃全 codebase"]
    PBT["fast-check<br>roundtrip property test"]
    Val["Typia<br>boundary validation"]
    Mut["Stryker<br>mutation test"]

    C --> Agent
    Agent -->|"金額不能用 float"| AST
    Agent -->|"parse 和 serialize 互逆"| PBT
    Agent -->|"API input 必須合法"| Val
    Agent -->|"測試真的有在測東西"| Mut

Constraint 是暫態的（agent 讀一次），artifact 是永久的（進 CI，每次跑）。Constraint 描述意圖，artifact 是機械化的 enforcement——兩者分離，各司其職。

Constraints 可累積

一個 constraints/ 資料夾，一個 .md 一條規則。越加越多，agent 每次都重新讀取並 enforce。版本控制、code review、diff 全部走 git 標準流程。

Deterministic 工具鏈

這個 pattern 不限於特定語言或 domain。核心是四個層級的 deterministic 工具，每個生態系有自己的選項：

層級	TypeScript	OCaml	Rust	Python
Lint	Biome / Oxlint	ocamlformat	clippy	ruff
Validation	Typia / ArkType / Zod	Gospel	?	pydantic
PBT	fast-check	QCheck / Ortac	proptest	Hypothesis
Mutation	Stryker	?	cargo-mutants	mutmut

以下用 TypeScript 生態系舉例（最成熟），但 mindset 適用於任何 stack。所有工具本地執行，不需額外 LLM API key。Agent 的角色是操作它們、解讀結果、平行化執行。

第一層：Biome — 靜態分析 + 格式化

Rust 驅動的 linter + formatter，比 ESLint 快 50-100x。

v2 自帶 type inference 引擎（Biotype），不需要 TypeScript compiler
預設規則集是 correctness 導向
官網：https://biomejs.dev/

npm install --save-dev --save-exact @biomejs/biome
npx @biomejs/biome init
npx @biomejs/biome check --write .

Agent 工作流：per workspace 平行跑 biome check，auto-fix 所有 correctness 規則。最快見效，幾分鐘搞定。

[!tip] Biome vs Oxlint Biome：自帶 type inference，不依賴 TypeScript。formatter + linter 一體。 Oxlint：透過 tsgo（TypeScript 7 的 Go port）做 type-aware linting，100% TS 相容但需要額外的 Go binary。建議先用 Biome。碰到 Biotype 推導不準的邊界案例，再考慮 Oxlint。

第二層：Runtime Validation — 守住型別在 runtime 的保證

TypeScript 的型別在 runtime 消失。API boundary、user input、外部資料需要 runtime validation。

Typia vs ArkType vs Zod

	Typia	ArkType	Zod
哲學	TS type 就是 schema，零重複定義	TS 語法寫 schema，基於集合論	獨立 schema DSL，生態最廣
效能	極快（AOT 編譯期展開）	比 Zod 快 ~100x	最慢，v4 改善約 2x
Bundle	零 runtime	小	中等
Agent 友善度	==最高==——annotate 現有 type	高——語法跟 TS 一對一	中——需另寫一套 schema
生態	Nestia、OpenAPI	獨立	tRPC、RHF、Next.js
官網	https://typia.io/	https://arktype.io/	https://zod.dev/

怎麼選：

新專案 → Typia（agent 不用寫新 schema，直接 annotate 現有 interface）
複雜型別推導 → ArkType（集合論基礎，union/intersection/recursive 最精確）
已有 Zod 生態 → 留著 Zod（不值得為效能遷移）

Typia 範例——agent 只需在現有 interface 上加 tags：

import typia from 'typia';

interface User {
  id: string & typia.tags.Format<'uuid'>;
  email: string & typia.tags.Format<'email'>;
  age: number & typia.tags.Type<'uint32'> & typia.tags.Minimum<0>;
  name: string & typia.tags.MinLength<1>;
}

const validate = typia.createValidate<User>();

Agent 工作流：per API route / per boundary 平行——找出 trust boundary，加 validation，跑測試確認無 regression。

[!warning] 選擇障礙的解法先用 Typia。Agent 的瓶頸不是寫 schema，而是找到所有該 validate 的地方。Typia 讓 agent 專注在「找 boundary」而非「翻譯 type 成 schema」。

第三層：Property-Based Testing — 從型別推斷性質

傳統 unit test 寫具體 input/output pair。PBT 反過來——描述性質，框架自動產生大量隨機 input 驗證。

工具：fast-check（https://fast-check.dev/）——TS 生態唯一成熟的 PBT 框架。

常見 property patterns：

Pattern	範例
Roundtrip	`parse(serialize(x)) === x`
Idempotent	`format(format(x)) === format(x)`
Invariant	`sort(xs).length === xs.length`
Commutative	`merge(a, b) === merge(b, a)`
Model-based	你的實作 vs 簡單但正確的參考實作

import fc from 'fast-check';

it('roundtrip: parse(serialize(x)) === x', () => {
  fc.assert(
    fc.property(fc.json(), (input) => {
      const parsed = JSON.parse(input);
      expect(parse(serialize(parsed))).toEqual(parsed);
    })
  );
});

Agent 工作流：per function 平行——讀取型別簽名 + JSDoc → 推斷適用 pattern → 產生 fc.property(...) → 執行確認。

[!warning] Agent 常見陷阱 Agent 容易寫出 trivially true 的 property（例如 x === x）。審核時問：如果有 bug，這個 property 真的會失敗嗎？ 搭配 mutation testing 可自動偵測此問題。

第四層：Mutation Testing — 測試你的測試

工具會對程式碼做微小變異（> → >=、+ → -、true → false）。測試依然通過 = mutant 存活 = 測試沒覆蓋到那段邏輯。

工具：Stryker Mutator（https://stryker-mutator.io/）

npm install --save-dev @stryker-mutator/core
npx stryker init
npx stryker run

Agent 工作流：per module 平行——跑 stryker → 解析 JSON report → 針對存活 mutant 寫補強測試 → 重跑確認擊殺。

[!tip] Mutation Score 被擊殺 mutants / 全部 mutants。低於 60% 代表測試品質堪憂。目標不是 100%（有些 equivalent mutants 殺不掉）。

閉迴圈：PBT + Mutation Testing

第三層和第四層形成正向迴圈：

flowchart TD
    Write["Agent 寫 property tests<br>（fast-check）"]
    Run["Agent 跑 mutation testing<br>（Stryker）"]
    Dead["mutant 全死<br>測試品質足夠 ✅"]
    Alive["有 mutant 存活"]
    Fix["分析原因<br>補強 property"]

    Write --> Run
    Run -->|全死| Dead
    Run -->|存活| Alive --> Fix --> Write

這個 loop 全程不需要人介入，直到最後審核。

全 Pipeline 總覽

flowchart TD
    subgraph L1["第一層：Biome"]
        L1D["Agent 跑 lint + format<br>auto-fix correctness 規則"]
    end
    subgraph L2["第二層：Runtime Validation"]
        L2D["Agent 掃描 trust boundary<br>per route 平行加 Typia / ArkType / Zod"]
    end
    subgraph L3["第三層：Property-Based Testing"]
        L3D["Agent 讀型別簽名<br>per function 平行產生 fast-check property tests"]
    end
    subgraph L4["第四層：Mutation Testing"]
        L4D["Agent 用 Stryker 驗證測試品質"]
    end

    L1 --> L2 --> L3 --> L4
    L4 -->|"存活 mutant"| L3

每層可獨立導入。建議順序：Biome → Validation → PBT → Mutation Testing。

每一層的 artifact 都進 CI，永久 enforce。人類的 constraint 轉化為機械化的 verification，不再依賴人工 review。

延伸思考

這個 framing 解答了「該不該學 Coq/Lean4」

你不需要自己寫 proof。你需要的是設計 verification pipeline，讓 agent 的非確定性產出能被機械化驗證。Stryker + fast-check + Biome + Typia 就是你的「type checker」。

跨語言 spec compilation

一份 NL spec，agent 平行「編譯」成多語言實作 + 一致性測試：

flowchart TD
    Spec["NL spec<br>（constraint .md）"]
    A["Agent A<br>產生 OCaml 實作"]
    B["Agent B<br>產生 TypeScript 實作"]
    C["Agent C<br>產生 fast-check<br>驗證兩者行為一致"]

    Spec --> A & B & C
    A & B --> C

Invariant broadcasting

傳統 lint rule 只能表達語法 pattern。NL constraint 能表達語意 invariant：

「所有碰到金額的函式都必須用 Decimal，不能用 float」
「每個 API handler 的 error response 都要包含 correlation ID」

一個 agent 掃 codebase 找違反，另一批 agent 平行修復。

參考資料

Linting

Validation

Testing

Agent + Testing 研究

Anthropic PBT Agent — Claude 對 100+ 套件自動寫 PBT，找到 numpy 真實 bug
Trail of Bits: Mutation Testing for the Agentic Era
Meta ACH: LLMs Are the Key to Mutation Testing

caasi/natural-language-metaprogramming.md

Select an option

No results found