Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save ruvnet/6cab9351160fa5a75193a25ae0442753 to your computer and use it in GitHub Desktop.

Select an option

Save ruvnet/6cab9351160fa5a75193a25ae0442753 to your computer and use it in GitHub Desktop.
ruvector PDX: Columnar Vector Layout — 2-3.4x faster Rust ANN scans, SIGMOD 2025, auto-vectorised, 100% recall, dimension-pruning search

ruvector 2026: PDX Columnar Vector Layout — High-Performance Rust Vector Search

2–3.4× faster ANN scans with zero code changes and 100% recall, using LLVM auto-vectorisation via columnar memory layout (SIGMOD 2025)

ruvector now ships ruvector-pdx: the first Rust implementation of the PDX (Partition-Dimension-eXchange) data layout from CWI Amsterdam's SIGMOD 2025 paper. By transposing vector storage from row-major to columnar within each partition block, the inner L2 distance loop becomes stride-1 and LLVM auto-vectorises it with AVX2 — no hand-written intrinsics, no unsafe code, no platform-specific dependencies.

Introduction

Modern vector databases (Pinecone, Qdrant, Weaviate, Milvus, LanceDB, FAISS) store embedding vectors in row-major layout: each vector is a contiguous row of D floats. When scanning N vectors to compute the nearest neighbour of a query, the inner loop must access dimension d of all N vectors — but those values are scattered D×4 bytes apart in memory. The CPU prefetcher and SIMD units hate this: they want contiguous data.

PDX fixes this at the data layout level. Within each partition block of N vectors, PDX stores dimension d as a contiguous column of N floats. The distance kernel becomes:

for dim in 0..D {
    let col = block.col(dim);   // &data[dim * N .. (dim+1) * N]  ← stride-1
    for i in 0..N {             // LLVM emits AVX2 vmovups + vfmadd automatically
        partial[i] += (query[dim] - col[i]).powi(2);
    }
}

No unsafe. No intrinsics. Just pure Rust — and 2–3.4× more throughput.

Features

  • Three swappable backends behind one AnnIndex: Send + Sync trait:
    • RowMajorIndex — exact row-major baseline (100% recall)
    • PdxFlatIndex — PDX columnar layout, no pruning (2.1–3.4× faster)
    • PdxPruneIndex — PDX + exponential lower-bound pruning (2.0–2.75× faster)
  • Dimension-pruning: monotone lower-bound early exit — if partial_l2 > τ, vector cannot be top-k. Zero false negatives. 100% recall guaranteed.
  • Safe Rust only: no unsafe, no C FFI, no platform feature gates
  • 12 integration tests (no mocks) — correctness, error handling, memory accounting
  • Drop-in trait: same AnnIndex interface as all other ruvector backends

Benchmarks (Real Numbers — x86_64 Linux, rustc --release, no hand-written SIMD)

Hardware: x86_64 Linux, AMD/Intel, rustc 1.77+ --release. Data: 50-cluster Gaussian, σ=0.5, n=10K–50K, D=96–384, k=10, 200 queries. Block size = 64.

Variant n D Recall@10 QPS Speedup
RowMajorIndex 10,000 96 100.0% 2,023 1.0× (baseline)
PdxFlatIndex 10,000 96 100.0% 4,726 +2.34×
PdxPruneIndex 10,000 96 100.0% 4,057 +2.01×
RowMajorIndex 10,000 384 100.0% 400 1.0×
PdxFlatIndex 10,000 384 100.0% 1,148 +2.87×
PdxPruneIndex 10,000 384 100.0% 1,002 +2.50×
RowMajorIndex 50,000 128 100.0% 283 1.0×
PdxFlatIndex 50,000 128 100.0% 610 +2.16×
PdxPruneIndex 50,000 128 100.0% 572 +2.02×
RowMajorIndex 50,000 384 100.0% 59 1.0×
PdxFlatIndex 50,000 384 100.0% 202 +3.42×
PdxPruneIndex 50,000 384 100.0% 162 +2.75×

Speedup grows with D. Highest impact at D=384 (standard text embeddings) and D=1536 (OpenAI ada-002, Cohere embed-v3). All recall = 100% — PDX is exact.

Reproduce:

git clone https://github.com/ruvnet/ruvector
cd ruvector
git checkout research/nightly/2026-05-08-pdx-columnar-scan
cargo run --release -p ruvector-pdx

Benefits

Benefit Detail
2–3.4× QPS LLVM AVX2 auto-vectorisation of stride-1 inner loop
100% recall PDX is a pure memory-layout change — identical math
Zero unsafe Pure Rust, no intrinsics, no C/C++ deps
Universal Works with any embedding model (no MRL training required)
Drop-in Same AnnIndex trait as all ruvector backends
WASM-ready Architecture: WASM SIMD128 port planned for browser ML

Comparison: ruvector-pdx vs Competitors

System Scan Layout SIMD Strategy Pruning Recall Rust
ruvector-pdx Columnar (PDX) Auto-vectorised Lower-bound 100% ✅ Pure
FAISS Row-major Hand-written AVX2/AVX-512 Partial 100% ❌ C++
Qdrant Row-major simsimd (C library) None (flat) 100% ⚠️ C FFI
Milvus (Knowhere) Row-major Hand-written SIMD None 100% ❌ C++
LanceDB Arrow columnar Arrow batch-level None 100%
Weaviate Row-major CGo SIMD None 100% ❌ Go/C
Pinecone Proprietary Proprietary Proprietary ~99%
CWI PDX reference Columnar Hand-written C++ ADSampling 100% ❌ C++

ruvector-pdx is the only pure-Rust columnar vector scan with explicit dimension pruning available on crates.io.

Optimizations Planned

Priority Optimization Expected Gain
P0 Block size 256 (Vec bitmask) +20–40% throughput
P0 ruvector-cluster integration 2–3× speedup for all IVF queries
P1 ADSampling χ² statistical bound 2× more pruning at 99.5% recall
P2 target_feature(enable="avx2") Force AVX2 without RUSTFLAGS
P2 Rayon parallel block scan Linear scaling with core count
P3 ruvector-pdx-wasm (SIMD128) PDX in browser / edge inference
P3 PDX + RaBitQ combination 4× memory + 2–3× scan improvement

Get Started

# Clone ruvector
git clone https://github.com/ruvnet/ruvector
cd ruvector

# Checkout the PDX research branch
git checkout research/nightly/2026-05-08-pdx-columnar-scan

# Build and run the benchmark
cargo run --release -p ruvector-pdx

# Run the test suite (12 tests, no mocks)
cargo test -p ruvector-pdx

Repository: https://github.com/ruvnet/ruvector
Research branch: research/nightly/2026-05-08-pdx-columnar-scan
ADR: docs/adr/ADR-193-pdx-columnar-scan.md
Research doc: docs/research/nightly/2026-05-08-pdx-columnar-scan/README.md
Paper: arXiv:2503.04422 — Kuffo, Krippner, Boncz (CWI Amsterdam, SIGMOD 2025)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment