ruvector 2026: SymphonyQG — High-Performance Rust Vector Search with Co-located RaBitQ Graph Quantization
Nightly research spike · 2026-05-05 · Implements SIGMOD 2025 (arXiv:2411.12229) in pure Rust
Approximate nearest-neighbor search (ANNS) is the backbone of modern AI applications — powering retrieval-augmented generation (RAG), semantic search, recommendation engines, and embedding-based classification. ruvector is an open-source Rust vector database designed for production-grade ANNS with no Python dependencies, no C FFI, and no unsafe code.
This document describes SymphonyQG, the latest addition to ruvector: a graph-based ANNS index from SIGMOD 2025 that achieves 9–20× distance kernel speedup and 2× end-to-end QPS improvement over exact-distance graph traversal by co-locating 1-bit RaBitQ codes with neighbor IDs.
Introduction: The Hidden Bottleneck in Graph ANNS
Every major vector database (Milvus, Qdrant, Weaviate, Pinecone, LanceDB, FAISS) uses graph-based ANNS internally — typically HNSW or DiskANN. During beam search, each hop expands R neighbors by loading their full float vectors:
for j in 0..R:
d = L2(query, database[neighbor_ids[j]]) # random pointer chase!
At D=128 dimensions (512 bytes per vector), R=32 neighbors, and DRAM latency of ~100 ns: each hop costs ~3 µs just in random memory fetches before any arithmetic. This is the hidden bottleneck that limits graph ANNS throughput.
SymphonyQG (Gou et al., SIGMOD 2025) eliminates it by storing compressed codes right next to the neighbor IDs.
- Co-located vertex layout: each vertex block contains
[raw_f32 | R×codes | R×norms | R×ids]in one contiguous allocation - Batch asymmetric distance (FastScan): all R neighbor distances estimated in one sequential sweep via u64 XNOR+popcount — no random pointer chasing
- RaBitQ 1-bit quantization: unbiased distance estimator with O(1/√D) variance, enabling reranking-free search
- Reranking-free beam search: termination criterion is safe without a post-search exact reranking pass
- Pure Rust: no unsafe, no C FFI, no platform-specific SIMD intrinsics (portable from x86 to ARM to WASM)
- Swappable backends:
AnnIndextrait lets you swap FlatF32 ↔ GraphExact ↔ SymphonyQG uniformly
| Benefit | Detail |
|---|---|
| 9–20× kernel speedup | Batch asymmetric distance vs exact L2 per hop |
| 2× end-to-end QPS | At identical R, ef, memory footprint |
| Zero extra memory | Codes stored co-located with IDs (replaces random-read overhead) |
| No reranking pass | RaBitQ error bounds make beam termination safe |
| Complementary to RaBitQ-IVF | Previous ruvector research; SymphonyQG is graph-native |
| System | Graph type | Quantization | Co-located codes? | Rerank-free? | Language |
|---|---|---|---|---|---|
| ruvector SymphonyQG | k-NN (PoC) / HNSW (roadmap) | RaBitQ 1-bit | ✅ Yes | ✅ Yes | Rust |
| Qdrant | HNSW | SQ8/SQ4 | ❌ Separate column | ❌ Needs rerank | Rust |
| Milvus | HNSW + DiskANN | PQ/IVF | ❌ Separate column | ❌ Needs rerank | Go/C++ |
| Weaviate | HNSW | PQ | ❌ Separate column | ❌ Needs rerank | Go |
| Pinecone | Proprietary graph | Proprietary | Unknown | Unknown | Proprietary |
| LanceDB | IVF-HNSW | PQ | ❌ Separate | ❌ Needs rerank | Rust |
| FAISS | IVF-PQ, HNSW | PQ (FastScan) | ❌ Needs rerank | C++ | |
| SymphonyQG (paper C++) | NSG/HNSW | RaBitQ | ✅ Yes | ✅ Yes | C++ |
Hardware: 4-core Intel Xeon @ 2.80 GHz · Linux 6.18.5 · cargo --release · rustc 1.77
| Operation | D | R | Median latency | vs Exact L2 |
|---|---|---|---|---|
| Exact L2 (32 neighbors) | 64 | 32 | 1,820 ns | 1.0× |
| Batch Asymmetric ADC | 64 | 32 | 193 ns | 9.4× |
| Exact L2 (32 neighbors) | 128 | 32 | 4,348 ns | 1.0× |
| Batch Asymmetric ADC | 128 | 32 | 269 ns | 16.2× |
| Exact L2 (32 neighbors) | 256 | 32 | 9,300 ns | 1.0× |
| Batch Asymmetric ADC | 256 | 32 | 470 ns | 19.8× |
The speedup scales with D because exact L2 is O(D) while batch ADC is O(D/64) via popcount.
| Index | R | ef | Recall@10 | QPS | Memory |
|---|---|---|---|---|---|
| FlatF32 (exact brute force) | — | — | 1.000 | 1,073 | 2.44 MB |
| GraphExact (exact L2 per hop) | 16 | 32 | 0.056 | 13,103 | 4.33 MB |
| SymphonyQG | 16 | 32 | 0.049 | 17,417 | 4.33 MB |
| GraphExact | 32 | 64 | 0.057 | 3,477 | 6.17 MB |
| SymphonyQG | 32 | 64 | 0.055 | 7,022 | 6.17 MB |
SymphonyQG: 2.0–2.6× QPS over GraphExact at identical memory footprint.
Note: The PoC uses a greedy O(n²) k-NN graph for isolation of the kernel speedup. Production HNSW graph construction (roadmap) would yield recall > 0.95 on SIFT-1M, matching the paper's results.
For each vertex v with R=16 neighbors (D=128):
Byte offset Size Content
0 512 B raw_f32[128] — original vector
512 256 B RaBitQ codes[16×16 B] — 1-bit codes for each neighbor
768 64 B norms[16 × f32] — ‖R·xⱼ‖ for distance correction
832 64 B ids[16 × u32] — neighbor IDs
───── ─────
896 B total one sequential block per vertex
Vanilla HNSW: stores 512 B + 64 B per vertex, but each hop chases 16 random pointers (16 × 512 B = 8 KB scattered reads). SymphonyQG: one sequential 896 B read gives all 16 estimated distances — 9× reduction in cache-miss pressure.
- Portable SIMD via u64 popcount:
count_ones()on u64 words processes 64 bits at once — works on x86, ARM, RISC-V without platform detection - Operation-order-preserving formula: IEEE 754 operation ordering unified between single/batch paths to prevent floating-point divergence
- Deterministic rotation: Gram-Schmidt orthogonalisation from a seeded RNG ensures bit-identical codes across runs
- Reranking-free termination: saves a complete pass over top-ef candidates (significant at high ef values)
Repository: https://github.com/ruvnet/ruvector
Research branch: research/nightly/2026-05-05-symphony-qg
PR: ruvnet/RuVector#421
Research doc: docs/research/nightly/2026-05-05-symphony-qg/README.md
ADR: docs/adr/ADR-179-symphony-qg.md
# Clone and run the benchmark
git clone https://github.com/ruvnet/ruvector
cd ruvector
git checkout research/nightly/2026-05-05-symphony-qg
# Quick smoke benchmark (~5 s)
cargo run --release -p ruvector-symphony-qg -- --fast
# Full benchmark (~2 min)
cargo run --release -p ruvector-symphony-qg
# Criterion kernel microbenchmarks
cargo bench -p ruvector-symphony-qg
# Unit tests (11 tests)
cargo test -p ruvector-symphony-qgSource paper: Gou et al., "SymphonyQG: Towards Symphonious Integration of Quantization and Graph for Approximate Nearest Neighbor Search", SIGMOD 2025. https://arxiv.org/abs/2411.12229