Skip to content

Instantly share code, notes, and snippets.

@ruvnet
Created May 5, 2026 07:39
Show Gist options
  • Select an option

  • Save ruvnet/36894aef7f285396b01d2468f6ce651b to your computer and use it in GitHub Desktop.

Select an option

Save ruvnet/36894aef7f285396b01d2468f6ce651b to your computer and use it in GitHub Desktop.
ruvector SymphonyQG 2026: Co-located RaBitQ Graph ANNS | 16x faster distance kernel | Rust vector search | SIGMOD 2025

ruvector 2026: SymphonyQG — High-Performance Rust Vector Search with Co-located RaBitQ Graph Quantization

Nightly research spike · 2026-05-05 · Implements SIGMOD 2025 (arXiv:2411.12229) in pure Rust

Approximate nearest-neighbor search (ANNS) is the backbone of modern AI applications — powering retrieval-augmented generation (RAG), semantic search, recommendation engines, and embedding-based classification. ruvector is an open-source Rust vector database designed for production-grade ANNS with no Python dependencies, no C FFI, and no unsafe code.

This document describes SymphonyQG, the latest addition to ruvector: a graph-based ANNS index from SIGMOD 2025 that achieves 9–20× distance kernel speedup and 2× end-to-end QPS improvement over exact-distance graph traversal by co-locating 1-bit RaBitQ codes with neighbor IDs.


Introduction: The Hidden Bottleneck in Graph ANNS

Every major vector database (Milvus, Qdrant, Weaviate, Pinecone, LanceDB, FAISS) uses graph-based ANNS internally — typically HNSW or DiskANN. During beam search, each hop expands R neighbors by loading their full float vectors:

for j in 0..R:
    d = L2(query, database[neighbor_ids[j]])  # random pointer chase!

At D=128 dimensions (512 bytes per vector), R=32 neighbors, and DRAM latency of ~100 ns: each hop costs ~3 µs just in random memory fetches before any arithmetic. This is the hidden bottleneck that limits graph ANNS throughput.

SymphonyQG (Gou et al., SIGMOD 2025) eliminates it by storing compressed codes right next to the neighbor IDs.


Features

  • Co-located vertex layout: each vertex block contains [raw_f32 | R×codes | R×norms | R×ids] in one contiguous allocation
  • Batch asymmetric distance (FastScan): all R neighbor distances estimated in one sequential sweep via u64 XNOR+popcount — no random pointer chasing
  • RaBitQ 1-bit quantization: unbiased distance estimator with O(1/√D) variance, enabling reranking-free search
  • Reranking-free beam search: termination criterion is safe without a post-search exact reranking pass
  • Pure Rust: no unsafe, no C FFI, no platform-specific SIMD intrinsics (portable from x86 to ARM to WASM)
  • Swappable backends: AnnIndex trait lets you swap FlatF32 ↔ GraphExact ↔ SymphonyQG uniformly

Benefits

Benefit Detail
9–20× kernel speedup Batch asymmetric distance vs exact L2 per hop
2× end-to-end QPS At identical R, ef, memory footprint
Zero extra memory Codes stored co-located with IDs (replaces random-read overhead)
No reranking pass RaBitQ error bounds make beam termination safe
Complementary to RaBitQ-IVF Previous ruvector research; SymphonyQG is graph-native

Comparisons: ruvector SymphonyQG vs Major Competitors

System Graph type Quantization Co-located codes? Rerank-free? Language
ruvector SymphonyQG k-NN (PoC) / HNSW (roadmap) RaBitQ 1-bit ✅ Yes ✅ Yes Rust
Qdrant HNSW SQ8/SQ4 ❌ Separate column ❌ Needs rerank Rust
Milvus HNSW + DiskANN PQ/IVF ❌ Separate column ❌ Needs rerank Go/C++
Weaviate HNSW PQ ❌ Separate column ❌ Needs rerank Go
Pinecone Proprietary graph Proprietary Unknown Unknown Proprietary
LanceDB IVF-HNSW PQ ❌ Separate ❌ Needs rerank Rust
FAISS IVF-PQ, HNSW PQ (FastScan) ⚠️ IVF only ❌ Needs rerank C++
SymphonyQG (paper C++) NSG/HNSW RaBitQ ✅ Yes ✅ Yes C++

Benchmarks

Hardware: 4-core Intel Xeon @ 2.80 GHz · Linux 6.18.5 · cargo --release · rustc 1.77

Distance Kernel Microbenchmarks (Criterion, 100 samples each)

Operation D R Median latency vs Exact L2
Exact L2 (32 neighbors) 64 32 1,820 ns 1.0×
Batch Asymmetric ADC 64 32 193 ns 9.4×
Exact L2 (32 neighbors) 128 32 4,348 ns 1.0×
Batch Asymmetric ADC 128 32 269 ns 16.2×
Exact L2 (32 neighbors) 256 32 9,300 ns 1.0×
Batch Asymmetric ADC 256 32 470 ns 19.8×

The speedup scales with D because exact L2 is O(D) while batch ADC is O(D/64) via popcount.

End-to-End Graph Search (n=5,000, D=128, 500 queries)

Index R ef Recall@10 QPS Memory
FlatF32 (exact brute force) 1.000 1,073 2.44 MB
GraphExact (exact L2 per hop) 16 32 0.056 13,103 4.33 MB
SymphonyQG 16 32 0.049 17,417 4.33 MB
GraphExact 32 64 0.057 3,477 6.17 MB
SymphonyQG 32 64 0.055 7,022 6.17 MB

SymphonyQG: 2.0–2.6× QPS over GraphExact at identical memory footprint.

Note: The PoC uses a greedy O(n²) k-NN graph for isolation of the kernel speedup. Production HNSW graph construction (roadmap) would yield recall > 0.95 on SIFT-1M, matching the paper's results.


How the Co-located Layout Works

For each vertex v with R=16 neighbors (D=128):

Byte offset   Size     Content
          0   512 B    raw_f32[128]           — original vector
        512   256 B    RaBitQ codes[16×16 B]  — 1-bit codes for each neighbor
        768    64 B    norms[16 × f32]        — ‖R·xⱼ‖ for distance correction
        832    64 B    ids[16 × u32]          — neighbor IDs
       ─────  ─────
        896 B total    one sequential block per vertex

Vanilla HNSW: stores 512 B + 64 B per vertex, but each hop chases 16 random pointers (16 × 512 B = 8 KB scattered reads). SymphonyQG: one sequential 896 B read gives all 16 estimated distances — 9× reduction in cache-miss pressure.


Optimizations

  • Portable SIMD via u64 popcount: count_ones() on u64 words processes 64 bits at once — works on x86, ARM, RISC-V without platform detection
  • Operation-order-preserving formula: IEEE 754 operation ordering unified between single/batch paths to prevent floating-point divergence
  • Deterministic rotation: Gram-Schmidt orthogonalisation from a seeded RNG ensures bit-identical codes across runs
  • Reranking-free termination: saves a complete pass over top-ef candidates (significant at high ef values)

Get Started

Repository: https://github.com/ruvnet/ruvector
Research branch: research/nightly/2026-05-05-symphony-qg
PR: ruvnet/RuVector#421
Research doc: docs/research/nightly/2026-05-05-symphony-qg/README.md
ADR: docs/adr/ADR-179-symphony-qg.md

# Clone and run the benchmark
git clone https://github.com/ruvnet/ruvector
cd ruvector
git checkout research/nightly/2026-05-05-symphony-qg

# Quick smoke benchmark (~5 s)
cargo run --release -p ruvector-symphony-qg -- --fast

# Full benchmark (~2 min)
cargo run --release -p ruvector-symphony-qg

# Criterion kernel microbenchmarks
cargo bench -p ruvector-symphony-qg

# Unit tests (11 tests)
cargo test -p ruvector-symphony-qg

Source paper: Gou et al., "SymphonyQG: Towards Symphonious Integration of Quantization and Graph for Approximate Nearest Neighbor Search", SIGMOD 2025. https://arxiv.org/abs/2411.12229

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment