Skip to content

Instantly share code, notes, and snippets.

@kennypete
Last active October 26, 2024 10:07
Show Gist options
  • Save kennypete/fd8860f867b5fe8857be43a9b58643a3 to your computer and use it in GitHub Desktop.
Save kennypete/fd8860f867b5fe8857be43a9b58643a3 to your computer and use it in GitHub Desktop.
Using libcall() in Vim

Vim’s libcall() builtin function

Vim’s libcall() is used to call a function in either a Windows .dll or Linux .so library.

Help?

Vim’s builtin.txt:

libcall({libname}, {funcname}, {argument})
		Call function {funcname} in the run-time library {libname}
		with single argument {argument}.
		This is useful to call functions in a library that you
		especially made to be used with Vim.  Since only one argument
		is possible, calling standard library functions is rather
		limited.
		The result is the String returned by the function.  If the
		function returns NULL, this will appear as an empty string ""
		to Vim.
		If the function returns a number, use libcallnr()!
		If {argument} is a number, it is passed to the function as an
		int; if {argument} is a string, it is passed as a
		null-terminated string.
		This function will fail in restricted-mode.

		libcall() allows you to write your own 'plug-in' extensions to
		Vim without having to recompile the program.  It is NOT a
		means to call system functions!  If you try to do so Vim will
		very probably crash.

		For Win32, the functions you write must be placed in a DLL
		and use the normal C calling convention (NOT Pascal which is
		used in Windows System DLLs).  The function must take exactly
		one parameter, either a character pointer or a long integer,
		and must return a character pointer or NULL.  The character
		pointer returned must point to memory that will remain valid
		after the function has returned (e.g. in static data in the
		DLL).  If it points to allocated memory, that memory will
		leak away.  Using a static buffer in the function should work,
		it's then freed when the DLL is unloaded.

Why use libcall()?

Some scenarios may be well-suited to using libcall(). One use case is a large dictionary. Although vim9script is compiled, Vim cannot pre-compile vim9script. So, a dictionary or list used by a plugin needs to be created with each and every new Vim instance. That’s fine, usually, but what if you have a dictionary or list that is several, or even hundreds, of megabytes?

ℹ️
The Unicode character database, for example, is a few hundred megabytes.

Especially if you are writing it for your own setup, where the vagaries of operating systems, versions, etc., are of only concern to you, using pre-compiled .dll or .so may have advantages.

An example (using a dictionary extract)

As noted in the help, the returned result is always either a string or a number. The help notes that is a limitation, but it won’t be in all cases. Keep in mind, returning a string means that even dictionaries of dictionaries are feasible (using eval() on that returned string, which is what is demonstrated in the following example).

Depending on how it’s written, the compiled .dll or .so should be very fast, though some factors may significantly impact performance. One such factor is using WSL and having the .so in a location on the Windows file system (/mnt/c/Users/…​, for example).

🔥
I am no C programmer! The following C code is potentially not great! (And it was created with some AI help.) That is not an issue for this example, and, at any rate, the same concept has been tested on a 300Mb .dll and .so, and the string for any requested key was returned in <0.1 second. So, there may well be room for improvement, but it does the job for this illustration.

gcc installation

Windows 64-bit

ℹ️
Skip this if a C compiler already exists on your Windows PC (though the scripts calling the compiler may need adjustment, perhaps).

MSYS2 with gcc is a relatively simple means of adding the gcc C compiler to a Windows 64-bit PC. Steps:

  1. Install MSYS2 from https://www.msys2.org/

  2. In MSYS2 UCRT64, install gcc with:

    pacman -S mingw-w64-ucrt-x86_64-gcc
  3. Validate that gcc has installed:

    gcc --version

    Something like this should be returned:

    gcc.exe (Rev3, Built by MSYS2 project) 14.1.0
    Copyright (C) 2024 Free Software Foundation, Inc.

Debian-based Linux

There are many sites explaining how to do this. Commonly, using sudo apt install build-essential is recommended. Search for that if you do not have gcc installed on your Debian-based Linux machine (or, also, WSL: Debian, Ubuntu, et al).

C code example

The following eg.c file creates a static array of key-value pairs where the key is the Unicode code point and the value is a tiny subset of the Unicode database (i.e., the XML version) content for each associated code point. Each value is a string, which subsequently may be turned into a Vim dictionary with eval(). As noted in the comments, this has been left as-is, aside from adding more comments and removal of over 155,000 pairs, for this demo.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <stdint.h>

// NB: The _real_ Unicode code points table has >155k entries.
//     It has been left as-is.
#define TABLE_SIZE 262144

typedef struct {
    const char *key;
    const char *value;
} KeyValue;

typedef struct HashNode {
    const char *key;
    const char *value;
    struct HashNode *next;
} HashNode;

static HashNode *hash_table[TABLE_SIZE] = {0};
static int initialized = 0;

// Static array of key-value pairs.
// The 'real' data is >155,000 key-value pairs.
static KeyValue dictionary[] = {
  {"00A0", "{'na': 'NO-BREAK SPACE', 'gc': 'Zs', 'bc': 'CS', 'dt': 'nb', 'dm': '0020'}"},
  {"00A1", "{'na': 'INVERTED EXCLAMATION MARK', 'gc': 'Po', 'bc': 'ON'}"},
  {"00A2", "{'na': 'CENT SIGN', 'gc': 'Sc', 'bc': 'ET'}"},
  {"00A3", "{'na': 'POUND SIGN', 'gc': 'Sc', 'bc': 'ET'}"},
  {"00A4", "{'na': 'CURRENCY SIGN', 'gc': 'Sc', 'bc': 'ET'}"},
  {"00A5", "{'na': 'YEN SIGN', 'gc': 'Sc', 'bc': 'ET'}"},
  {"00A6", "{'na': 'BROKEN BAR', 'gc': 'So', 'bc': 'ON'}"},
  {"00A7", "{'na': 'SECTION SIGN', 'gc': 'Po', 'bc': 'ON'}"},
  {"00A8", "{'na': 'DIAERESIS', 'gc': 'Sk', 'bc': 'ON', 'dt': 'com', 'dm': '0020 0308'}"},
  {"00A9", "{'na': 'COPYRIGHT SIGN', 'gc': 'So', 'bc': 'ON'}"},
  {"00AA", "{'na': 'FEMININE ORDINAL INDICATOR', 'dt': 'sup', 'dm': '0061'}"},
  {"00AB", "{'na': 'LEFT-POINTING DOUBLE ANGLE QUOTATION MARK', 'gc': 'Pi', 'bc': 'ON', 'bm': 'Y'}"},
  {"00AC", "{'na': 'NOT SIGN', 'gc': 'Sm', 'bc': 'ON'}"},
  {"00AD", "{'na': 'SOFT HYPHEN', 'gc': 'Cf', 'bc': 'BN'}"},
  {"00AE", "{'na': 'REGISTERED SIGN', 'gc': 'So', 'bc': 'ON'}"},
  {"00AF", "{'na': 'MACRON', 'gc': 'Sk', 'bc': 'ON', 'dt': 'com', 'dm': '0020 0304'}"},
  {NULL, NULL} // End marker
};


// FNV-1a hash function.
// Again, this has been left as produced by Claude AI for
// where there are ~155k key/value pairs, which have been omitted.
uint32_t hash(const char *key) {
    uint32_t h = 2166136261u;
    for (; *key; key++) {
        h ^= *key;
        h *= 16777619;
    }
    return h % TABLE_SIZE;
}

void init_hash_table() {
    if (initialized) return;
    for (int i = 0; dictionary[i].key != NULL; i++) {
        uint32_t index = hash(dictionary[i].key);
        HashNode *new_node = malloc(sizeof(HashNode));
        new_node->key = dictionary[i].key;
        new_node->value = dictionary[i].value;
        new_node->next = hash_table[index];
        hash_table[index] = new_node;
    }
    initialized = 1;
}

const char* get_value(const char *key) {
    if (!initialized) {
        init_hash_table();
    }
    uint32_t index = hash(key);
    HashNode *current = hash_table[index];
    while (current != NULL) {
        if (strcmp(current->key, key) == 0) {
            return current->value;
        }
        current = current->next;
    }
    return "{}";
}

If this is saved as eg.c, the command line gcc -O3 -shared -o eg.dll eg.c, (Windows) or gcc -O3 -fPIC -shared -o eg.so eg.c (Linux) should create the associated eg.dll or eg.so file.

💡
These commands are saved to eg_dll.sh and eg_so.sh in the libcall_Vim_builtin.7z file, below. The former should be run with MSYS2 UCRT64 Shell and the latter in Linux / WSL.

How to use libcall() with eg.dll or eg.so

To directly use libcall() with eg.dll or eg.so, the following Windows and Linux instructions explain how.

Windows

Open Vim, enter command-line mode (with :), then put the following, replacing FULLPATH with the full file path (to the .dll):

call append('$', libcall('{FULLPATH}\eg', 'get_value', '00A1'))
ℹ️
In Windows, the .dll extension is omitted from {libname}, hence it is {FULL_PATH}\eg, with no .dll extension. This is explained in builtin.txt.

Linux

Using Linux, open Vim, ensure your current working directory is where the .so is, then:

call append('$', libcall('./eg.so', 'get_value', '00A1'))

In either instance the following should be appended to the end of the active buffer:

{'na': 'INVERTED EXCLAMATION MARK', 'gc': 'Po', 'bc': 'ON'}

Demo files

The demo files eg_dll_test.vim and eg_so_test.vim may be used to see this working in action. Their content is not reproduced here, nor are they essential. They include using eval() to take the returned string, turn it into a dictionary, and return a value relating to a specified key. They are also shown working in the animated .gif files, below.

7z

All the files related to this gist are in libcall_Vim_builtin.7z.

.gif demos

Demos in Windows (eg_dll_test.vim in Neovim 0.10.2) and in Debian WSL (eg_so_test.vim in Vim 9.1.90) are shown, below.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <stdint.h>
// NB: the _real_ table has >155k entries. It has been left as-is.
#define TABLE_SIZE 262144
typedef struct {
const char *key;
const char *value;
} KeyValue;
typedef struct HashNode {
const char *key;
const char *value;
struct HashNode *next;
} HashNode;
static HashNode *hash_table[TABLE_SIZE] = {0};
static int initialized = 0;
// Static array of key-value pairs
static KeyValue dictionary[] = {
{"00A0", "{'na': 'NO-BREAK SPACE', 'gc': 'Zs', 'bc': 'CS', 'dt': 'nb', 'dm': '0020'}"},
{"00A1", "{'na': 'INVERTED EXCLAMATION MARK', 'gc': 'Po', 'bc': 'ON'}"},
{"00A2", "{'na': 'CENT SIGN', 'gc': 'Sc', 'bc': 'ET'}"},
{"00A3", "{'na': 'POUND SIGN', 'gc': 'Sc', 'bc': 'ET'}"},
{"00A4", "{'na': 'CURRENCY SIGN', 'gc': 'Sc', 'bc': 'ET'}"},
{"00A5", "{'na': 'YEN SIGN', 'gc': 'Sc', 'bc': 'ET'}"},
{"00A6", "{'na': 'BROKEN BAR', 'gc': 'So', 'bc': 'ON'}"},
{"00A7", "{'na': 'SECTION SIGN', 'gc': 'Po', 'bc': 'ON'}"},
{"00A8", "{'na': 'DIAERESIS', 'gc': 'Sk', 'bc': 'ON', 'dt': 'com', 'dm': '0020 0308'}"},
{"00A9", "{'na': 'COPYRIGHT SIGN', 'gc': 'So', 'bc': 'ON'}"},
{"00AA", "{'na': 'FEMININE ORDINAL INDICATOR', 'dt': 'sup', 'dm': '0061'}"},
{"00AB", "{'na': 'LEFT-POINTING DOUBLE ANGLE QUOTATION MARK', 'gc': 'Pi', 'bc': 'ON', 'bm': 'Y'}"},
{"00AC", "{'na': 'NOT SIGN', 'gc': 'Sm', 'bc': 'ON'}"},
{"00AD", "{'na': 'SOFT HYPHEN', 'gc': 'Cf', 'bc': 'BN'}"},
{"00AE", "{'na': 'REGISTERED SIGN', 'gc': 'So', 'bc': 'ON'}"},
{"00AF", "{'na': 'MACRON', 'gc': 'Sk', 'bc': 'ON', 'dt': 'com', 'dm': '0020 0304'}"},
{NULL, NULL} // End marker
};
// FNV-1a hash function - again, this has been left as-is
// notwithstanding >155k key/value pairs have been omitted.
uint32_t hash(const char *key) {
uint32_t h = 2166136261u;
for (; *key; key++) {
h ^= *key;
h *= 16777619;
}
return h % TABLE_SIZE;
}
void init_hash_table() {
if (initialized) return;
for (int i = 0; dictionary[i].key != NULL; i++) {
uint32_t index = hash(dictionary[i].key);
HashNode *new_node = malloc(sizeof(HashNode));
new_node->key = dictionary[i].key;
new_node->value = dictionary[i].value;
new_node->next = hash_table[index];
hash_table[index] = new_node;
}
initialized = 1;
}
const char* get_value(const char *key) {
if (!initialized) {
init_hash_table();
}
uint32_t index = hash(key);
HashNode *current = hash_table[index];
while (current != NULL) {
if (strcmp(current->key, key) == 0) {
return current->value;
}
current = current->next;
}
return "{}";
}
" NB: THIS MUST BE SOURCED IN WINDOWS WITH
" :so eg.dll.test.vim, not plain :so, which will fail.
" It should append comments to each append command:
" {'gc': 'Sc', 'na': 'CENT SIGN', 'bc': 'ET'} and
" CENT SIGN
let ROOT = substitute(expand('<script>:h'), '[\u005C]', '/', 'g')
let libname = ROOT .. '/eg'
call append(8, '" ' .. libcall(libname, "get_value", '00A2'))
call append(10, '" ' .. eval(libcall(libname, "get_value", '00A2'))['na'])
" NB: THIS MUST BE SOURCED IN LINUX WITH
" :so eg.so.test.vim, not plain :so, which will fail.
" It should append comments to each append command:
" {'gc': 'Sc', 'na': 'CENT SIGN', 'bc': 'ET'} and
" CENT SIGN
let ROOT = substitute(expand('<script>:h'), '[\u005C]', '/', 'g')
let libname = ROOT .. '/eg.so'
call append(8, '" ' .. libcall(libname, "get_value", '00A2'))
call append(10, '" ' .. eval(libcall(libname, "get_value", '00A2'))['na'])
#!/bin/sh
# Run with MSYS2 UCRT64 Shell and gcc (in the user's PATH variable in Windows)
gcc -O3 -shared -o eg.dll eg.c
#!/bin/sh
# Run with Linux gcc
gcc -O3 -fPIC -shared -o eg.so eg.c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment