ruvector 2026: SymphonyQG — High-Performance Rust Vector Search with Co-located RaBitQ Graph Quantization

Nightly research spike · 2026-05-05 · Implements SIGMOD 2025 (arXiv:2411.12229) in pure Rust

Approximate nearest-neighbor search (ANNS) is the backbone of modern AI applications — powering retrieval-augmented generation (RAG), semantic search, recommendation engines, and embedding-based classification. ruvector is an open-source Rust vector database designed for production-grade ANNS with no Python dependencies, no C FFI, and no unsafe code.

This document describes SymphonyQG, the latest addition to ruvector: a graph-based ANNS index from SIGMOD 2025 that achieves 9–20× distance kernel speedup and 2× end-to-end QPS improvement over exact-distance graph traversal by co-locating 1-bit RaBitQ codes with neighbor IDs.

Introduction: The Hidden Bottleneck in Graph ANNS

Every major vector database (Milvus, Qdrant, Weaviate, Pinecone, LanceDB, FAISS) uses graph-based ANNS internally — typically HNSW or DiskANN. During beam search, each hop expands R neighbors by loading their full float vectors:

for j in 0..R:
    d = L2(query, database[neighbor_ids[j]])  # random pointer chase!

At D=128 dimensions (512 bytes per vector), R=32 neighbors, and DRAM latency of ~100 ns: each hop costs ~3 µs just in random memory fetches before any arithmetic. This is the hidden bottleneck that limits graph ANNS throughput.

SymphonyQG (Gou et al., SIGMOD 2025) eliminates it by storing compressed codes right next to the neighbor IDs.

Features

Co-located vertex layout: each vertex block contains [raw_f32 | R×codes | R×norms | R×ids] in one contiguous allocation
Batch asymmetric distance (FastScan): all R neighbor distances estimated in one sequential sweep via u64 XNOR+popcount — no random pointer chasing
RaBitQ 1-bit quantization: unbiased distance estimator with O(1/√D) variance, enabling reranking-free search
Reranking-free beam search: termination criterion is safe without a post-search exact reranking pass
Pure Rust: no unsafe, no C FFI, no platform-specific SIMD intrinsics (portable from x86 to ARM to WASM)
Swappable backends: AnnIndex trait lets you swap FlatF32 ↔ GraphExact ↔ SymphonyQG uniformly

Benefits

Benefit	Detail
9–20× kernel speedup	Batch asymmetric distance vs exact L2 per hop
2× end-to-end QPS	At identical R, ef, memory footprint
Zero extra memory	Codes stored co-located with IDs (replaces random-read overhead)
No reranking pass	RaBitQ error bounds make beam termination safe
Complementary to RaBitQ-IVF	Previous ruvector research; SymphonyQG is graph-native

Comparisons: ruvector SymphonyQG vs Major Competitors

System	Graph type	Quantization	Co-located codes?	Rerank-free?	Language
ruvector SymphonyQG	k-NN (PoC) / HNSW (roadmap)	RaBitQ 1-bit	✅ Yes	✅ Yes	Rust
Qdrant	HNSW	SQ8/SQ4	❌ Separate column	❌ Needs rerank	Rust
Milvus	HNSW + DiskANN	PQ/IVF	❌ Separate column	❌ Needs rerank	Go/C++
Weaviate	HNSW	PQ	❌ Separate column	❌ Needs rerank	Go
Pinecone	Proprietary graph	Proprietary	Unknown	Unknown	Proprietary
LanceDB	IVF-HNSW	PQ	❌ Separate	❌ Needs rerank	Rust
FAISS	IVF-PQ, HNSW	PQ (FastScan)	⚠️ IVF only	❌ Needs rerank	C++
SymphonyQG (paper C++)	NSG/HNSW	RaBitQ	✅ Yes	✅ Yes	C++

Benchmarks

Hardware: 4-core Intel Xeon @ 2.80 GHz · Linux 6.18.5 · cargo --release · rustc 1.77

Distance Kernel Microbenchmarks (Criterion, 100 samples each)

Operation	D	R	Median latency	vs Exact L2
Exact L2 (32 neighbors)	64	32	1,820 ns	1.0×
Batch Asymmetric ADC	64	32	193 ns	9.4×
Exact L2 (32 neighbors)	128	32	4,348 ns	1.0×
Batch Asymmetric ADC	128	32	269 ns	16.2×
Exact L2 (32 neighbors)	256	32	9,300 ns	1.0×
Batch Asymmetric ADC	256	32	470 ns	19.8×

The speedup scales with D because exact L2 is O(D) while batch ADC is O(D/64) via popcount.

End-to-End Graph Search (n=5,000, D=128, 500 queries)

Index	R	ef	Recall@10	QPS	Memory
FlatF32 (exact brute force)	—	—	1.000	1,073	2.44 MB
GraphExact (exact L2 per hop)	16	32	0.056	13,103	4.33 MB
SymphonyQG	16	32	0.049	17,417	4.33 MB
GraphExact	32	64	0.057	3,477	6.17 MB
SymphonyQG	32	64	0.055	7,022	6.17 MB

SymphonyQG: 2.0–2.6× QPS over GraphExact at identical memory footprint.

Note: The PoC uses a greedy O(n²) k-NN graph for isolation of the kernel speedup. Production HNSW graph construction (roadmap) would yield recall > 0.95 on SIFT-1M, matching the paper's results.

How the Co-located Layout Works

For each vertex v with R=16 neighbors (D=128):

Byte offset   Size     Content
          0   512 B    raw_f32[128]           — original vector
        512   256 B    RaBitQ codes[16×16 B]  — 1-bit codes for each neighbor
        768    64 B    norms[16 × f32]        — ‖R·xⱼ‖ for distance correction
        832    64 B    ids[16 × u32]          — neighbor IDs
       ─────  ─────
        896 B total    one sequential block per vertex

Vanilla HNSW: stores 512 B + 64 B per vertex, but each hop chases 16 random pointers (16 × 512 B = 8 KB scattered reads). SymphonyQG: one sequential 896 B read gives all 16 estimated distances — 9× reduction in cache-miss pressure.

Optimizations

Portable SIMD via u64 popcount: count_ones() on u64 words processes 64 bits at once — works on x86, ARM, RISC-V without platform detection
Operation-order-preserving formula: IEEE 754 operation ordering unified between single/batch paths to prevent floating-point divergence
Deterministic rotation: Gram-Schmidt orthogonalisation from a seeded RNG ensures bit-identical codes across runs
Reranking-free termination: saves a complete pass over top-ef candidates (significant at high ef values)

Get Started

Repository: https://github.com/ruvnet/ruvector
Research branch: research/nightly/2026-05-05-symphony-qg
PR: ruvnet/RuVector#421
Research doc: docs/research/nightly/2026-05-05-symphony-qg/README.md
ADR: docs/adr/ADR-179-symphony-qg.md

# Clone and run the benchmark
git clone https://github.com/ruvnet/ruvector
cd ruvector
git checkout research/nightly/2026-05-05-symphony-qg

# Quick smoke benchmark (~5 s)
cargo run --release -p ruvector-symphony-qg -- --fast

# Full benchmark (~2 min)
cargo run --release -p ruvector-symphony-qg

# Criterion kernel microbenchmarks
cargo bench -p ruvector-symphony-qg

# Unit tests (11 tests)
cargo test -p ruvector-symphony-qg

Source paper: Gou et al., "SymphonyQG: Towards Symphonious Integration of Quantization and Graph for Approximate Nearest Neighbor Search", SIGMOD 2025. https://arxiv.org/abs/2411.12229

ruvnet/ruvector-symphony-qg-overview.md

Select an option

No results found