RXB Binary Format — Implementation Plan

Context

RX is a right-to-left text encoding for JSON-shaped data. We want a binary variant (RXB) that is smaller and faster by:

Replacing ASCII tag characters with integer tags packed into LEB128 varints
Using base-128 varints instead of base-64 encoded numbers
Adding a hexstring type for lowercase hex data (hashes, UUIDs)

Format Design

Combined Tag+Varint Encoding

Every node ends with a right-to-left LEB128 varint that packs the tag into its low 4 bits:

Rightmost byte (read first): [continue:1][value:3][tag:4]
Subsequent bytes (leftward): [continue:1][value:7]
Last byte (leftmost of varint): [0xxxxxxx] — continue=0

The continue bit means "more bytes to the left." Reading right-to-left:

value  = (byte0 >> 4) & 0x07        // 3 bits from first byte
value |= (byte1 & 0x7F) << 3        // 7 bits
value |= (byte2 & 0x7F) << 10       // 7 bits
...
tag    = byte0 & 0x0F

Value ranges per byte count:

Bytes	Value range	Bits
1	0–7	3
2	0–1,023	10
3	0–131,071	17
4	0–16,777,215	24
5+	up to 2^53	31+

Comparison with rx text format (b64 + separate tag char):

Value	rx bytes	rxb bytes	Savings
0	1 (tag only)	1	0
1-7	2 (tag+1 b64)	1	1
8-63	2	2	0
64-1023	2-3	2	0-1
1024-4095	3	3	0
4096-16383	3	3	0

Biggest win: values 1-7 (very common for small string lengths, small containers) drop from 2 bytes to 1.

Tag Assignments (4-bit, 0x0-0xF)

Tag	Name	Layout	Varint meaning
0x0	int	`[tag+varint]`	zigzag(value)
0x1	decimal	`[base_int_node][tag+varint]`	zigzag(exponent)
0x2	string	`[utf8 body][tag+varint]`	byte_length
0x3	hexstring	`[packed bytes][tag+varint]`	hex_char_count
0x4	ref	`[tag+varint]`	code (0=null,1=true,2=false,3=undef,4=inf,5=ninf,6=nan,7+=external)
0x5	list	`[children reversed][tag+varint]`	content_byte_size
0x6	map	`[kv reversed][idx?][schema?][tag+varint]`	content_byte_size
0x7	pointer	`[tag+varint]`	backward delta
0x8	chain	`[segments][tag+varint]`	content_byte_size
0x9	index	`[binary entries][tag+varint]`	packed: `(count<<3)\|(width-1)`
0xA-0xF	reserved	—	future use

Hexstring Encoding

Detect: string is non-empty, all chars in [0-9a-f], length >= 4
Pack: 2 hex chars per byte, high nibble first. Odd length: leading byte has high nibble = 0
Decode: convert packed bytes to hex, take last hex_char_count chars
Example: "deadbeef" (8 chars) → 4 bytes [0xDE,0xAD,0xBE,0xEF] + tag+varint (1 byte: 0x83 = tag 0x3, value 8>>... wait)

Actually with the combined encoding: tag=0x3, value=8. Byte0 = (8 >> 0 & 7) << 4 | 0x3 | 0x80 = 0x03 | 0x80 = needs continue because 8 > 7. Byte0 = (0 << 4) | 0x3 | 0x80 = 0x83, Byte1 = (8 >> 3) = 0x01. So 2 bytes: [0x01][0x83].

Index Entries

Fixed-width binary big-endian integers (1-8 bytes per entry). Packed varint = (count<<3)|(width-1).

External Refs

Ref codes 0-6 are builtins. Codes 7+ map to external ref names. Encoder/decoder sort ref keys alphabetically for deterministic index assignment.

Files to Create/Modify

New: `rxb.ts`

Parallel implementation to rx.ts for the binary format.

Imports from rx.ts:

toZigZag, fromZigZag — zigzag encoding
splitNumber — number decomposition for decimals
utf8Sort — UTF-8 byte-order comparison
makeKey — identity keys for pointer dedup
INDEX_THRESHOLD, STRING_CHAIN_THRESHOLD, STRING_CHAIN_DELIMITER, DEDUP_COMPLEXITY_LIMIT

Sections:

Combined tag+varint read/write/sizeof
Tag constants (TAG_INT=0x0 through TAG_INDEX=0x9)
Ref code constants (REF_NULL=0 through REF_NAN=6)
Hexstring helpers (isHexString, hexEncode, hexDecode)
Cursor + peekTag + read() — scan right-to-left past continue bytes, extract tag+value
String handling (readStr, resolveStr, strCompare, strEquals, strHasPrefix)
Container access (seekChild, collectChildren, findKey, findByPrefix)
Proxy-based open() / decode() API
inspect() API returning ASTNode
encode() — same structure as rx.ts encoder but with combined tag+varint, hexstrings, integer ref codes, binary index entries

New: `rxb.test.ts`

Mirrors rx.test.ts:

Tag+varint encode/decode roundtrips
Primitive roundtrips (int, float, string, hexstring, builtins)
Container roundtrips (arrays, objects, nested)
Pointer dedup, chains, schemas
Hexstring-specific (UUID, SHA-256, odd-length)
Cross-check: rxb.decode(rxb.encode(x)) matches rx.decode(rx.encode(x))

New: `docs/rxb-format.md`

Format spec mirroring docs/rx-format.md.

Modify: `package.json`

Add rxb.ts to build:esm
Add ./rxb subpath export
Add CJS build for rxb

Verification

bun test — all rxb tests pass
Encode sample JSON with both rx and rxb, verify rxb is smaller
Roundtrip: decode(encode(value)) matches original for all types
Hexstring: verify "deadbeef01234567" encodes as ~half the bytes vs regular string

creationix/rxb plan.md

Select an option

No results found

Select an option

No results found

RXB Binary Format — Implementation Plan

Context

Format Design

Combined Tag+Varint Encoding

Tag Assignments (4-bit, 0x0-0xF)

Hexstring Encoding

Index Entries

External Refs

Files to Create/Modify

New: `rxb.ts`

New: `rxb.test.ts`

New: `docs/rxb-format.md`

Modify: `package.json`

Verification

creationix/rxb plan.md

RXB Binary Format — Implementation Plan

Context

Format Design

Combined Tag+Varint Encoding

Tag Assignments (4-bit, 0x0-0xF)

Hexstring Encoding

Index Entries

External Refs

Files to Create/Modify

New: rxb.ts

New: rxb.test.ts

New: docs/rxb-format.md

Modify: package.json

Verification

New: `rxb.ts`

New: `rxb.test.ts`

New: `docs/rxb-format.md`

Modify: `package.json`