Thursday, May 21, 2026
HomeArtificial IntelligenceMeet Turbovec: A Rust Vector Index with Python Bindings, and Constructed on...

Meet Turbovec: A Rust Vector Index with Python Bindings, and Constructed on Google’s TurboQuant Algorithm

Vector search underpins most retrieval-augmented technology (RAG) pipelines. At scale, it will get costly. Storing 10 million doc embeddings in float32 consumes 31 GB of RAM. For dev groups working native or on-premise inference, that quantity creates actual constraints.

A brand new open-source library referred to as turbovec addresses this straight. It’s a vector index written in Rust with Python bindings. It’s constructed on TurboQuant, a quantization algorithm from Google Analysis. The identical 10-million-document corpus matches in 4 GB with turbovec. On ARM {hardware}, search pace beats FAISS IndexPQFastScan by 12–20%.

The TurboQuant Paper

TurboQuant was launched by Google’s analysis crew. The Google crew proposes TurboQuant as a data-oblivious quantizer. It achieves near-optimal distortion charges throughout all bit-widths and dimensions. It requires zero coaching and 0 passes over the info.

Most production-grade vector quantizers, together with FAISS’s Product Quantization, requires a codebook coaching step. You have to run k-means over a consultant pattern of your vectors earlier than indexing begins. In case your corpus grows or shifts, you might must retrain and rebuild the index solely. TurboQuant skips all of that. It makes use of an analytical property of rotated vectors as a substitute of a data-dependent calibration.

How turbovec Quantizes Vectors

The quantization pipeline has 4 steps:

(1) Every vector is normalized. The size (norm) is stripped and saved as a single float. Each vector turns into a unit path on a high-dimensional hypersphere.

(2) A random rotation is utilized. All vectors are multiplied by the identical random orthogonal matrix. After rotation, every coordinate independently follows a Beta distribution. In excessive dimensions, this converges to Gaussian N(0, 1/d). This holds for any enter information — the rotation makes the coordinate distribution predictable.

(3) Lloyd-Max scalar quantization is utilized. As a result of the distribution is understood analytically, the optimum bucket boundaries and centroids may be precomputed from the mathematics alone. For two-bit quantization, which means 4 buckets per coordinate. For 4-bit, it means 16 buckets. No information passes are wanted.

(4) The quantized coordinates are bit-packed into bytes. A 1536-dimensional vector shrinks from 6,144 bytes in FP32 to 384 bytes at 2-bit. That may be a 16x compression ratio.

At search time, the question is rotated as soon as into the identical area. Scoring occurs straight in opposition to the codebook values. The scoring kernel makes use of SIMD intrinsics — NEON on ARM and AVX-512BW on fashionable x86, with an AVX2 fallback — with nibble-split lookup tables for throughput.

TurboQuant achieves distortion inside roughly 2.7x of the information-theoretic Shannon decrease sure.

Recall and Pace: The Numbers

All benchmarks use 100K vectors, 1,000 queries, okay=64, and report the median of 5 runs.

For recall, turbovec compares in opposition to FAISS IndexPQ (LUT256, nbits=8, float32 LUT). This can be a sturdy baseline: FAISS makes use of a higher-precision LUT at scoring time and k-means++ for codebook coaching. Regardless of this, TurboQuant and FAISS are inside 0–1 level at R@1 for OpenAI embeddings at d=1536 and d=3072. Each converge to 1.0 recall by okay=4–8. GloVe at d=200 is tougher. At that dimension, TurboQuant trails FAISS by 3–6 factors at R@1, closing by okay≈16–32.

On pace, ARM outcomes (Apple M3 Max) present turbovec beating FAISS IndexPQFastScan by 12–20% throughout each configuration. On x86 (Intel Xeon Platinum 8481C / Sapphire Rapids, 8 vCPUs), turbovec wins each 4-bit configuration by 1–6%. It runs inside ~1% of FAISS on 2-bit single-threaded. Two configurations sit barely behind FAISS: 2-bit multi-threaded at d=1536 and d=3072. There, the internal accumulate loop is simply too quick for unrolling amortization. FAISS’s AVX-512 VBMI path holds the sting in these two circumstances (2–4%).

Python API

Set up is a single command: pip set up turbovec. The first class is TurboQuantIndex, initialized with a dimension and bit width.

from turbovec import TurboQuantIndex

index = TurboQuantIndex(dim=1536, bit_width=4)
index.add(vectors)
scores, indices = index.search(question, okay=10)
index.write("my_index.tq")

A second class, IdMapIndex, helps secure exterior uint64 IDs that survive deletes. Elimination is O(1) by ID. That is helpful for doc shops the place vectors are steadily up to date or deleted.

turbovec integrates with LangChain (pip set up turbovec[langchain]), LlamaIndex (pip set up turbovec[llama-index]), and Haystack (pip set up turbovec[haystack]). The Rust crate is accessible by way of cargo add turbovec.

Marktechpost’s Visible Explainer

What’s turbovec?

turbovec is a vector index written in Rust with Python bindings. It’s constructed on Google Analysis’s TurboQuant algorithm — a data-oblivious quantizer that requires zero codebook coaching. A ten million doc corpus that occupies 31 GB as float32 matches in 4 GB with turbovec.

16x compression at 2-bit

💨 Beats FAISS on ARM by 12–20%

🔒 Totally native — no information egress

📦 MIT licensed

Set up

Set up the Python package deal from PyPI with a single command. For Rust, add the crate by way of Cargo.

# Python
pip set up turbovec

# Rust
cargo add turbovec

Word: To construct from supply, set up maturin then run maturin construct –launch contained in the turbovec-python/ listing. For Rust, run cargo construct –launch.

Primary Utilization — TurboQuantIndex

TurboQuantIndex is the first class. Initialize it with a vector dim and a bit_width of two or 4. Vectors are listed instantly on add() — no coaching step required.

from turbovec import TurboQuantIndex

index = TurboQuantIndex(dim=1536, bit_width=4)

# Add vectors (numpy float32 array, form [n, dim])
index.add(vectors)
index.add(more_vectors)  # incremental provides are high-quality

# Search: returns top-k scores and positional indices
scores, indices = index.search(question, okay=10)

Steady IDs — IdMapIndex

Use IdMapIndex whenever you want exterior uint64 IDs that survive deletes. Elimination is O(1) by ID — helpful for doc shops the place vectors change over time.

import numpy as np
from turbovec import IdMapIndex

index = IdMapIndex(dim=1536, bit_width=4)

# Map vectors to your personal uint64 exterior IDs
index.add_with_ids(vectors, np.array([1001, 1002, 1003], dtype=np.uint64))

# Search returns your exterior IDs, not positional indices
scores, ids = index.search(question, okay=10)

# O(1) delete by exterior IDnindex.take away(1002)

Save & Load an Index

Each index varieties assist persistent storage. TurboQuantIndex writes to .tq recordsdata. IdMapIndex writes to .tvim recordsdata.

from turbovec import TurboQuantIndex, IdMapIndex

# TurboQuantIndex  —>  .tq
index.write("my_index.tq")
loaded = TurboQuantIndex.load("my_index.tq")

# IdMapIndex  —>  .tvim
index.write("my_index.tvim")
loaded = IdMapIndex.load("my_index.tvim")

Framework Integrations

turbovec ships optionally available extras for LangChain, LlamaIndex, and Haystack. Set up the additional that matches your stack.

# LangChain
pip set up turbovec[langchain]

# LlamaIndex
pip set up turbovec[llama-index]

# Haystack
pip set up turbovec[haystack]

Tip: Every integration plugs turbovec in as a drop-in vector retailer. See docs/integrations/ within the repo for full utilization examples with every framework.

Utilizing turbovec in Rust

The Rust API mirrors the Python API. Each TurboQuantIndex and IdMapIndex can be found. All x86_64 builds goal AVX2 as baseline; AVX-512 is enabled at runtime by way of characteristic detection.

use turbovec::TurboQuantIndex;

let mut index = TurboQuantIndex::new(1536, 4);
index.add(&vectors);

let outcomes = index.search(&queries, 10);

index.write("index.television").unwrap();
let loaded = TurboQuantIndex::load("index.television").unwrap();

📚 Full API: docs/api.md

⭐ github.com/RyanCodrai/turbovec

Key Takeaways

  • No codebook coaching. turbovec indexes vectors immediately — no k-means, no rebuilds because the corpus grows.
  • 16x compression. A 1536-dim float32 vector shrinks from 6,144 bytes to 384 bytes at 2-bit quantization.
  • Quicker than FAISS on ARM. turbovec beats FAISS IndexPQFastScan by 12–20% on ARM throughout each configuration.
  • Close to-optimal distortion. TurboQuant achieves distortion inside ~2.7x of the Shannon decrease sure — provably close to the theoretical restrict.
  • Totally native. No managed service, no information egress — pairs with any open-source embedding mannequin for an air-gapped RAG stack.

Try the Repo right hereAdditionally, be happy to comply with us on Twitter and don’t neglect to hitch our 150k+ ML SubReddit and Subscribe to our E-newsletter. Wait! are you on telegram? now you possibly can be part of us on telegram as effectively.

Have to companion with us for selling your GitHub Repo OR Hugging Face Web page OR Product Launch OR Webinar and so forth.? Join with us


RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments