RX is a right-to-left text encoding for JSON-shaped data. We want a binary variant (RXB) that is smaller and faster by:
- Replacing ASCII tag characters with integer tags packed into LEB128 varints
- Using base-128 varints instead of base-64 encoded numbers
- Adding a hexstring type for lowercase hex data (hashes, UUIDs)
Every node ends with a right-to-left LEB128 varint that packs the tag into its low 4 bits:
- Rightmost byte (read first):
[continue:1][value:3][tag:4] - Subsequent bytes (leftward):
[continue:1][value:7] - Last byte (leftmost of varint):
[0xxxxxxx]— continue=0
The continue bit means "more bytes to the left." Reading right-to-left:
value = (byte0 >> 4) & 0x07 // 3 bits from first byte
value |= (byte1 & 0x7F) << 3 // 7 bits
value |= (byte2 & 0x7F) << 10 // 7 bits
...
tag = byte0 & 0x0F
Value ranges per byte count:
| Bytes | Value range | Bits |
|---|---|---|
| 1 | 0–7 | 3 |
| 2 | 0–1,023 | 10 |
| 3 | 0–131,071 | 17 |
| 4 | 0–16,777,215 | 24 |
| 5+ | up to 2^53 | 31+ |
Comparison with rx text format (b64 + separate tag char):
| Value | rx bytes | rxb bytes | Savings |
|---|---|---|---|
| 0 | 1 (tag only) | 1 | 0 |
| 1-7 | 2 (tag+1 b64) | 1 | 1 |
| 8-63 | 2 | 2 | 0 |
| 64-1023 | 2-3 | 2 | 0-1 |
| 1024-4095 | 3 | 3 | 0 |
| 4096-16383 | 3 | 3 | 0 |
Biggest win: values 1-7 (very common for small string lengths, small containers) drop from 2 bytes to 1.
| Tag | Name | Layout | Varint meaning |
|---|---|---|---|
| 0x0 | int | [tag+varint] |
zigzag(value) |
| 0x1 | decimal | [base_int_node][tag+varint] |
zigzag(exponent) |
| 0x2 | string | [utf8 body][tag+varint] |
byte_length |
| 0x3 | hexstring | [packed bytes][tag+varint] |
hex_char_count |
| 0x4 | ref | [tag+varint] |
code (0=null,1=true,2=false,3=undef,4=inf,5=ninf,6=nan,7+=external) |
| 0x5 | list | [children reversed][tag+varint] |
content_byte_size |
| 0x6 | map | [kv reversed][idx?][schema?][tag+varint] |
content_byte_size |
| 0x7 | pointer | [tag+varint] |
backward delta |
| 0x8 | chain | [segments][tag+varint] |
content_byte_size |
| 0x9 | index | [binary entries][tag+varint] |
packed: (count<<3)|(width-1) |
| 0xA-0xF | reserved | — | future use |
- Detect: string is non-empty, all chars in
[0-9a-f], length >= 4 - Pack: 2 hex chars per byte, high nibble first. Odd length: leading byte has high nibble = 0
- Decode: convert packed bytes to hex, take last
hex_char_countchars - Example:
"deadbeef"(8 chars) → 4 bytes[0xDE,0xAD,0xBE,0xEF]+ tag+varint (1 byte:0x83= tag 0x3, value 8>>... wait)
Actually with the combined encoding: tag=0x3, value=8. Byte0 = (8 >> 0 & 7) << 4 | 0x3 | 0x80 = 0x03 | 0x80 = needs continue because 8 > 7. Byte0 = (0 << 4) | 0x3 | 0x80 = 0x83, Byte1 = (8 >> 3) = 0x01. So 2 bytes: [0x01][0x83].
Fixed-width binary big-endian integers (1-8 bytes per entry). Packed varint = (count<<3)|(width-1).
Ref codes 0-6 are builtins. Codes 7+ map to external ref names. Encoder/decoder sort ref keys alphabetically for deterministic index assignment.
Parallel implementation to rx.ts for the binary format.
Imports from rx.ts:
toZigZag,fromZigZag— zigzag encodingsplitNumber— number decomposition for decimalsutf8Sort— UTF-8 byte-order comparisonmakeKey— identity keys for pointer dedupINDEX_THRESHOLD,STRING_CHAIN_THRESHOLD,STRING_CHAIN_DELIMITER,DEDUP_COMPLEXITY_LIMIT
Sections:
- Combined tag+varint read/write/sizeof
- Tag constants (TAG_INT=0x0 through TAG_INDEX=0x9)
- Ref code constants (REF_NULL=0 through REF_NAN=6)
- Hexstring helpers (
isHexString,hexEncode,hexDecode) - Cursor +
peekTag+read()— scan right-to-left past continue bytes, extract tag+value - String handling (
readStr,resolveStr,strCompare,strEquals,strHasPrefix) - Container access (
seekChild,collectChildren,findKey,findByPrefix) - Proxy-based
open()/decode()API inspect()API returning ASTNodeencode()— same structure as rx.ts encoder but with combined tag+varint, hexstrings, integer ref codes, binary index entries
Mirrors rx.test.ts:
- Tag+varint encode/decode roundtrips
- Primitive roundtrips (int, float, string, hexstring, builtins)
- Container roundtrips (arrays, objects, nested)
- Pointer dedup, chains, schemas
- Hexstring-specific (UUID, SHA-256, odd-length)
- Cross-check:
rxb.decode(rxb.encode(x))matchesrx.decode(rx.encode(x))
Format spec mirroring docs/rx-format.md.
- Add
rxb.tstobuild:esm - Add
./rxbsubpath export - Add CJS build for rxb
bun test— all rxb tests pass- Encode sample JSON with both rx and rxb, verify rxb is smaller
- Roundtrip:
decode(encode(value))matches original for all types - Hexstring: verify
"deadbeef01234567"encodes as ~half the bytes vs regular string