Skip to content

Instantly share code, notes, and snippets.

@creationix
Created March 25, 2026 15:06
Show Gist options
  • Select an option

  • Save creationix/c201038395eb0c1a6014cdd0bc3689e0 to your computer and use it in GitHub Desktop.

Select an option

Save creationix/c201038395eb0c1a6014cdd0bc3689e0 to your computer and use it in GitHub Desktop.

RXB Binary Format — Implementation Plan

Context

RX is a right-to-left text encoding for JSON-shaped data. We want a binary variant (RXB) that is smaller and faster by:

  • Replacing ASCII tag characters with integer tags packed into LEB128 varints
  • Using base-128 varints instead of base-64 encoded numbers
  • Adding a hexstring type for lowercase hex data (hashes, UUIDs)

Format Design

Combined Tag+Varint Encoding

Every node ends with a right-to-left LEB128 varint that packs the tag into its low 4 bits:

  • Rightmost byte (read first): [continue:1][value:3][tag:4]
  • Subsequent bytes (leftward): [continue:1][value:7]
  • Last byte (leftmost of varint): [0xxxxxxx] — continue=0

The continue bit means "more bytes to the left." Reading right-to-left:

value  = (byte0 >> 4) & 0x07        // 3 bits from first byte
value |= (byte1 & 0x7F) << 3        // 7 bits
value |= (byte2 & 0x7F) << 10       // 7 bits
...
tag    = byte0 & 0x0F

Value ranges per byte count:

Bytes Value range Bits
1 0–7 3
2 0–1,023 10
3 0–131,071 17
4 0–16,777,215 24
5+ up to 2^53 31+

Comparison with rx text format (b64 + separate tag char):

Value rx bytes rxb bytes Savings
0 1 (tag only) 1 0
1-7 2 (tag+1 b64) 1 1
8-63 2 2 0
64-1023 2-3 2 0-1
1024-4095 3 3 0
4096-16383 3 3 0

Biggest win: values 1-7 (very common for small string lengths, small containers) drop from 2 bytes to 1.

Tag Assignments (4-bit, 0x0-0xF)

Tag Name Layout Varint meaning
0x0 int [tag+varint] zigzag(value)
0x1 decimal [base_int_node][tag+varint] zigzag(exponent)
0x2 string [utf8 body][tag+varint] byte_length
0x3 hexstring [packed bytes][tag+varint] hex_char_count
0x4 ref [tag+varint] code (0=null,1=true,2=false,3=undef,4=inf,5=ninf,6=nan,7+=external)
0x5 list [children reversed][tag+varint] content_byte_size
0x6 map [kv reversed][idx?][schema?][tag+varint] content_byte_size
0x7 pointer [tag+varint] backward delta
0x8 chain [segments][tag+varint] content_byte_size
0x9 index [binary entries][tag+varint] packed: (count<<3)|(width-1)
0xA-0xF reserved future use

Hexstring Encoding

  • Detect: string is non-empty, all chars in [0-9a-f], length >= 4
  • Pack: 2 hex chars per byte, high nibble first. Odd length: leading byte has high nibble = 0
  • Decode: convert packed bytes to hex, take last hex_char_count chars
  • Example: "deadbeef" (8 chars) → 4 bytes [0xDE,0xAD,0xBE,0xEF] + tag+varint (1 byte: 0x83 = tag 0x3, value 8>>... wait)

Actually with the combined encoding: tag=0x3, value=8. Byte0 = (8 >> 0 & 7) << 4 | 0x3 | 0x80 = 0x03 | 0x80 = needs continue because 8 > 7. Byte0 = (0 << 4) | 0x3 | 0x80 = 0x83, Byte1 = (8 >> 3) = 0x01. So 2 bytes: [0x01][0x83].

Index Entries

Fixed-width binary big-endian integers (1-8 bytes per entry). Packed varint = (count<<3)|(width-1).

External Refs

Ref codes 0-6 are builtins. Codes 7+ map to external ref names. Encoder/decoder sort ref keys alphabetically for deterministic index assignment.

Files to Create/Modify

New: rxb.ts

Parallel implementation to rx.ts for the binary format.

Imports from rx.ts:

  • toZigZag, fromZigZag — zigzag encoding
  • splitNumber — number decomposition for decimals
  • utf8Sort — UTF-8 byte-order comparison
  • makeKey — identity keys for pointer dedup
  • INDEX_THRESHOLD, STRING_CHAIN_THRESHOLD, STRING_CHAIN_DELIMITER, DEDUP_COMPLEXITY_LIMIT

Sections:

  1. Combined tag+varint read/write/sizeof
  2. Tag constants (TAG_INT=0x0 through TAG_INDEX=0x9)
  3. Ref code constants (REF_NULL=0 through REF_NAN=6)
  4. Hexstring helpers (isHexString, hexEncode, hexDecode)
  5. Cursor + peekTag + read() — scan right-to-left past continue bytes, extract tag+value
  6. String handling (readStr, resolveStr, strCompare, strEquals, strHasPrefix)
  7. Container access (seekChild, collectChildren, findKey, findByPrefix)
  8. Proxy-based open() / decode() API
  9. inspect() API returning ASTNode
  10. encode() — same structure as rx.ts encoder but with combined tag+varint, hexstrings, integer ref codes, binary index entries

New: rxb.test.ts

Mirrors rx.test.ts:

  • Tag+varint encode/decode roundtrips
  • Primitive roundtrips (int, float, string, hexstring, builtins)
  • Container roundtrips (arrays, objects, nested)
  • Pointer dedup, chains, schemas
  • Hexstring-specific (UUID, SHA-256, odd-length)
  • Cross-check: rxb.decode(rxb.encode(x)) matches rx.decode(rx.encode(x))

New: docs/rxb-format.md

Format spec mirroring docs/rx-format.md.

Modify: package.json

  • Add rxb.ts to build:esm
  • Add ./rxb subpath export
  • Add CJS build for rxb

Verification

  1. bun test — all rxb tests pass
  2. Encode sample JSON with both rx and rxb, verify rxb is smaller
  3. Roundtrip: decode(encode(value)) matches original for all types
  4. Hexstring: verify "deadbeef01234567" encodes as ~half the bytes vs regular string
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment