An HVM for a Generic FP Language

Intro

I want a good FP language that runs on an HVM. So I'm taking the HVM3 implementation and modifying to have the features I want. Check out the HVM3 Repo for the details of how it works first.

Terms

Terms are always 64-bits wide with a Tag in the lower 4 bits, giving 16 different possible tags.

Numbers

There are two tags for number terms, I60 and F60 for 60-bit integer and floats.

Native Values

One thing I had to have was the ability to create native C data structures. I won't say much about them here, except they're ref counted and garbage collected. They always appear on 4 byte boundaries, so by designating 0x0 and 0x8 as tags VAL and VL1, the pointers to these data structures are valid terms as is. Native data structures can never point to HVM nodes. They can only point to other native structures or native numbers.

When a native term interacts with a DUP term, it's ref count is incremented. When it interacts with an ERA term, it's count is decremented and if 0, the data structure is freed.

Refs

John Hughes said in "Why Functional Programming Matters"

"Typically the main function is defined in terms of other functions, which in turn are defined in terms of still more functions, until at the bottom level the functions are language primitives."

In my HVM, the language primitives are actually snippets of C code in functions with the following signature

typedef bool (*interactionFn)(Term ref, Term args);

An interactionFn pointer is always on 4 byte boundaries, so doesn't use the lower 3 bits of the 64-bit address. We take advantage by using a REF tag OR'd with the pointer address to point to a C function. The ref term is always the REF term itself and the args term is an APP term. The + port points to the first parameter and the - port points to another APP term for the next parameter. The - port of the APP term that points to the last parameter is where the result will be placed and typically holds a SUB term. (This will be very important later.)

The body of the interactionFn is C code that either constructs a subnet, typically of LAM nodes, to reduce the APP nodes of the parameters, or carrying out some primitive operations on the parameters. But in the latter case, the parameters all have to either be number terms or pointers to native data structures. Ensuring this is always the case before executing the C code is a big chunk of the work.

I created a function called strictArgs and a C struct to go with it.

#define MAX_ARGS 15
typedef struct {
  int count;
  Term args[MAX_ARGS + 2];
} NativeArgs;

Term strictArgs(Term ref, Term args, int expected, NativeArgs *argsStruct);

An interactionFn that implements some primitive functionality has the form

  NativeArgs arityArgs = {0, {}};
  args = strictArgs(ref, args, 1, &arityArgs);
  if (arityArgs.count == 1) {
...
    move(port(2, term_loc(args)), result);
  }
  return TRUE;

strictArgs starts consuming the APP nodes at args. As long as it finds VAL/VL1/I56/F56 terms in the + ports of the APP nodes, it copies them into the arityArgs struct. If the expected number of native parameters are found, the body of the code where the ... is, is executed and whatever the result term is set to is moved to - port of the last APP node.

However, a parameter may be any positive term and each gets special treatment. I'll only explain 2; VAR, and LAZ. I haven't worked out much more than that yet, but pretty sure I know how it will work.

A VAR term points to the result of an APP/LAM or APP/REF reduction. If the VAR term points to a location that contains a native parameter term, that term gets put into arityArgs and everything continues. If it contains a SUB term though, this means that the other APP/(LAM,REF) reduction hasn't completed and so this C code can't be executed until it has and a parameter is ready. So strictArgs builds a chain of APP nodes from the parameters it already has collected in arityArgs, prepends it to the last APP node it reached, and then stores the ref term and the new APP term into a pair of nodes. And here's the tricky part. It then replaces the SUB term the VAR pointed to with a new SUB term where the location points to the APP/REF pair. So in my HVM, a SUB node may point to a redex stored in a pair of nodes. Eventually, the other APP redex completes and the result is moved to the node that contains this SUB term. When the move function sees the location field is non-zero, it writes the new result value to node, then pushes the APP/REF redex on to the redex stack. It eventually gets popped, and the interactionFn gets called again and finds the parameter it was waiting on.

Lazyness

Which brings us to lazyness. For my purposes, I need APP/REF redexes to not be reduced until their results are for sure needed. So when a call site is encoded as an APP/REF redex, the redex is stored in a pair of nodes and a term pointing to that pair is tagged with LAZ and put in the - port of the APP. This appears to introduce a cycle in the network, but this redex isn't really in the network being reduced. This LAZ term can be handled like any other term. It can be duplicated, superposed with other terms, etc. It can be erased by interacting with an ERA term without ever having been reduced.

If strictArgs comes across a LAZ term, it's handled much like a VAR term that points to a SUB. The redex pointed to by the LAZ term is retrieved and pushed to the stack. An APP chain is built from arityArgs and SUB node that points to the new APP/REF redex is put in the location where the LAZ term was found. When that result becomes available, reduction continues.

Duplicating LAZ nodes is complicated, but essentially ensures that any copy of the LAZ term that gets resolved causes copies of the result to be sent towards all the different places it was copied to. However, the copying of the result value is itself lazy. It's only copied as many times as needed.

Garbage Nodes

Nodes are always allocated in pairs and terms are written to each node. when the term in a location is taken, it's replaced with a zero value to signify there's nothing there. If both nodes of a pair are zero, that pair can then be reused.

A list of these free pairs is kept where allocations are from the head and newly freed pairs are added to the head.

Multi-threaded

For effeciency, not every redex gets put on to the redex stack. ERA interactions and primitive OPs involving I60 and F60 values, for example, can be handled immediately without any further information. So they're never pushed. Oher redex will be because their reduction can happen in parallel. The only two points of thread contention are the node allocator and the redex stack. Keeping contention to a minimum is vital to good performance.

As mrsteed has discovered, memory ordering of atomic operations might introduce race conditions. I'll jump off that bridge when I get there.

jduey/hvm.md