Skip to content

Instantly share code, notes, and snippets.

@richardkiss
Created December 24, 2024 20:14
Show Gist options
  • Save richardkiss/61ad9a63050b6fa64caea6ea82033a66 to your computer and use it in GitHub Desktop.
Save richardkiss/61ad9a63050b6fa64caea6ea82033a66 to your computer and use it in GitHub Desktop.
Detailed explanation of how clvm decompression works

Understanding CLVM Compression

Introduction

CLVM compression was designed for straightforward implementation within CLVM itself. To understand how it works, we'll first examine the standard deserialization algorithm implemented in CLVM. See also here https://github.com/Chia-Network/clvm_rs/blob/7e58298b85aed07672678e54b9f28571724814dd/cl/deserialize_w_backrefs.cl

Basic Concepts: CLVM Stack Operations

A CLVM stack is a nil-terminated list of CLVM objects. Stack operations push and pop from the front:

  • Push operation: (c new_object old_stack)
  • Pop operation: (f stack) returns topmost object
  • Get remaining stack: (r old_stack)

Examples of stack states:

()          # empty stack
(foo)       # one-element stack with "foo" on top
(bar foo)   # two-element stack with "bar" on top

Visual representation as trees:

0           # empty tree

🔲
|  \
|   \
foo  0      # one-element stack

🔲
|  \
|   \
foo  🔲
    /  \
   bar  0   # two-element stack

Cursor and Stream Processing

A cursor is analogous to a C FILE pointer, consisting of:

  • An atom representing the serialized CLVM stream
  • An integer index tracking the current position

Deserialization Algorithm

The core recursive function sexp_from_stream takes two parameters:

  1. A CLVM stack
  2. A cursor

Processing Steps of sexp_from_stream:

  1. Read the first byte from the cursor
  2. If byte ≠ 0xff:
    • Deserialize as atom directly
    • Push onto stack: (atom ...)
    • Return updated stack and cursor
  3. If byte = 0xff:
    • Recursively read LEFT object onto stack
    • Recursively read RIGHT object onto stack
    • Stack becomes: (RIGHT LEFT ...)
    • Pop RIGHT and LEFT
    • Push (LEFT . RIGHT)
    • Final stack: ((LEFT . RIGHT) ...)

In either case, a single new clvm object is pushed onto the stack.

The main algorithm:

  1. Calls sexp_from_stream with empty stack and cursor at index 0
  2. Returns final stack and cursor
  3. Pop top element to get deserialized CLVM

Compression Implementation

Compression introduces the back-reference prefix (0xfe). When encountered:

  1. Deserialize following atom as path
  2. Navigate path down stack using standard CLVM conventions:
    tree@0    = nil
    tree@1    = tree
    tree@2n   = (f (tree@n))
    tree@2n+1 = (r (tree@n))
    
  3. Copy referenced subtree to stack

While decompression is straightforward, compression is complex due to context-sensitive path references that change with stack updates.

@richardkiss
Copy link
Author

@richardkiss
Copy link
Author

If you consider the clvm node, there is likely a fairly straightforward way to write a function that shows how to "find the path" to a prior node from a later node just using the two nodes "final paths". This might lead to a compression algorithm that differs from the existing implementation, so might have different performance characteristics. Being able to quickly calculate how long decompression paths are based on final paths may turn out to be very useful.

Roughly, the "further away" two nodes are in the tree, the longer the path is. Generally, to get from one node to another, you go up to the least common ancestor, then down to the particular node. The total number of steps corresponds to how many bits are in the path of the stack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment