CLVM compression was designed for straightforward implementation within CLVM itself. To understand how it works, we'll first examine the standard deserialization algorithm implemented in CLVM. See also here https://github.com/Chia-Network/clvm_rs/blob/7e58298b85aed07672678e54b9f28571724814dd/cl/deserialize_w_backrefs.cl
A CLVM stack is a nil-terminated list of CLVM objects. Stack operations push and pop from the front:
- Push operation:
(c new_object old_stack)
- Pop operation:
(f stack)
returns topmost object - Get remaining stack:
(r old_stack)
Examples of stack states:
() # empty stack
(foo) # one-element stack with "foo" on top
(bar foo) # two-element stack with "bar" on top
Visual representation as trees:
0 # empty tree
🔲
| \
| \
foo 0 # one-element stack
🔲
| \
| \
foo 🔲
/ \
bar 0 # two-element stack
A cursor is analogous to a C FILE
pointer, consisting of:
- An atom representing the serialized CLVM stream
- An integer index tracking the current position
The core recursive function sexp_from_stream
takes two parameters:
- A CLVM stack
- A cursor
- Read the first byte from the cursor
- If byte ≠ 0xff:
- Deserialize as atom directly
- Push onto stack: (atom ...)
- Return updated stack and cursor
- If byte = 0xff:
- Recursively read LEFT object onto stack
- Recursively read RIGHT object onto stack
- Stack becomes:
(RIGHT LEFT ...)
- Pop RIGHT and LEFT
- Push
(LEFT . RIGHT)
- Final stack:
((LEFT . RIGHT) ...)
In either case, a single new clvm object is pushed onto the stack.
The main algorithm:
- Calls
sexp_from_stream
with empty stack and cursor at index 0 - Returns final stack and cursor
- Pop top element to get deserialized CLVM
Compression introduces the back-reference prefix (0xfe). When encountered:
- Deserialize following atom as path
- Navigate path down stack using standard CLVM conventions:
tree@0 = nil tree@1 = tree tree@2n = (f (tree@n)) tree@2n+1 = (r (tree@n))
- Copy referenced subtree to stack
While decompression is straightforward, compression is complex due to context-sensitive path references that change with stack updates.
See also https://gist.github.com/richardkiss/86f8ba4a04f4eb6b44234f014daa3d31