I often utilize hash tables in my projects, some of which handle billions of small keys. The performance of hash table –– both in terms of speed and memory usage –– is critical to these projects. I conducted two hash table benchmarks in 2008 and 2018, respectively. While they were among early popular benchmarks by then, they are now outdated and missing important modern implementations. This blog post serves as a follow-up, focusing specifically on C libraries.
The source code for this benchmark is available here. The following figures show the main results. Note that in all plots, lower values indicate better performance (if the image below is broken, click here for the original version):
In the figure above, the top row displays speed vs memory for the insertion-only task and the bottom row for the insertion-deletion mixed task. The figures on the right column highlight high-performance libraries. The CPython curve for the insertion-deletion task is truncated for visualization purposes. Here are some key observations:
- A high-performance library, such as CC, khashl, M*LIB, STC or Verstable, can be a few times faster AND use a few times less memory than a low-performance library. Therefore, selecting a library carefully is essential.
- Good C libraries approach the performance of
boost::unordered_flat_map
, one of the fastest hash table implementations in C/C++. For additional insights, refer to Jackson Allan's excellent benchmark. - High-performance C libraries all use macros extensively.
khashp uses the same algorithm as khashl but relies on
void*
for genericity. It is slower due to the overhead ofvoid*
and the use of function pointers. If optimizing hash table performance is critical, it’s advisable to embrace macros despite their complexity.
I recommend that C programmers implement their own hash tables as a valuable learning exercise. Achieving performance comparable to khashl requires <200 lines of code, and you can even surpass it by using a special key for empty buckets. For those who prefer an existing solution, consider exploring CC, khashl, M*LIB, STC, or Verstable. While they may not be the absolute best, their performance is competitive with most hash table implementations available in C, C++, or Rust.
Only generic libraries that support dynamic growth are considered.
This excludes many hash table implementations that work with string keys only or require fixed capacity.
The benchmark also includes two C++ implementations, std::unordered_map
from GCC v10.3.0 and boost::unordered_flat_map
from Boost v1.85.0, as reference points.
Library | Clone date | Generic | Comments |
---|---|---|---|
CC | 2025-03-17 | macro | |
CTL | 2025-03-17 | macro | |
CPython | 3.12 | CPython C APIs | |
dmap | 2025-03-16 | void* | Deletion is buggy |
hashtrie | 2025-03-16 | intrusive | Not a library; no deletion |
khashl | 2025-01-06 | macro | Implementation by me |
khashp | 2025-01-06 | void* | Adapted from khashl for void* |
M*LIB | 2025-03-17 | macro | The DICT_OA implementation only |
stb_ds | 2025-03-16 | macro | Part of the stb collection |
STC | 2025-03-16 | macro | |
Verstable | 2025-03-05 | macro | |
uthash | 2025-03-16 | macro | Also intrusive |
Other notable implementations:
- pottery: hash map APIs are not documented
- The CTL repo listed many other container implementations. Testing them out will take time as they all have different APIs.
Jackson Allan conducted an excellent benchmark on hash table performance. The benchmark here evaluates hash table libraries from a somewhat different angle. The udb3 repo explains the benchmark tasks:
There are two tasks. The first focuses on insertion only. We insert 80 million 32-bit integers, with duplicates, to the hash table and counts the occurrence of each distinct key. There are 16.6 million entries left in the table in the end. The second task evaluates both insertions and deletions. The input is the same 80 million integers. We insert an integer to the hash table if it is not in the table already, or delete it if present. There are 9.2 million integers left in the table. For both tasks, we record CPU time and peak memory at 11 checkpoints and report the average CPU time per million inputs and the average memory per entry in the table.
and the rationale:
A hash table library can usually be made faster by lowering its default load factor threshold but this comes at a cost of memory. Controlling the threshold would not give us a complete picture, either, because the optimal threshold of a library depends on its implementation details and even at the same load factor threshold, some libraries may use more auxiliary data than others and thus take more memory.
In this benchmark, we measure the speed and the memory at the same time. If library A is faster and uses less memory than B, A will be better no matter how we tune the load factor of B. If A is faster but uses more memory, we cannot draw a firm conclusion but at least we will see the speed-memory tradeoff of a library.
Note that the relative performance between hash table libraries may vary with key/value types and access patterns. Nonetheless, the consensus of multiple hash table benchmarks seems that a fast library for one task tends to be among the faster group for other tasks as well.