Indexing
Similar to ivfflat, the index type of VectorChord, RaBitQ(vchordrq) also divides vectors into lists, and then searches a subset of those lists that are closest to the query vector. It inherits the advantages of ivfflat
, such as fast build times and less memory usage, but has much better performance than hnsw and ivfflat.
The RaBitQ(vchordrq) index is supported on some pgvector types and metrics:
vector | halfvec | bit(n) | sparsevec | |
---|---|---|---|---|
L2 distance / <-> | ✅ | ✅ | 🆖 | ❌ |
inner product / <#> | ✅ | ✅ | 🆖 | ❌ |
cosine distance / <=> | ✅ | ✅ | 🆖 | ❌ |
L1 distance / <+> | ❌ | ❌ | 🆖 | ❌ |
Hamming distance/ <~> | 🆖 | 🆖 | ❌ | 🆖 |
Jaccard distance/ <%> | 🆖 | 🆖 | ❌ | 🆖 |
Where:
- ✅ means supported by pgvector and VectorChord
- ❌ means supported by pgvector but not by VectorChord
- 🆖 means not planned by pgvector and VectorChord
- 🔜 means supported by pgvector now and will be supported by VectorChord soon
To create the VectorChord RaBitQ(vchordrq) index, you can use the following SQL.
L2 distance
CREATE INDEX ON items USING vchordrq (embedding vector_l2_ops) WITH (options = $$
residual_quantization = true
[build.internal]
lists = [1000]
spherical_centroids = false
$$);
NOTE
options
are specified using a TOML: Tom's Obvious Minimal Language string.- Set
residual_quantization
totrue
andspherical_centroids
tofalse
for L2 distance embeddings - Use
halfvec_l2_ops
forhalfvec
- The recommended
lists
could be rows / 1000 for up to 1M rows and 4 * sqrt(rows) for over 1M rows
Inner product
CREATE INDEX ON items USING vchordrq (embedding vector_ip_ops) WITH (options = $$
residual_quantization = false
[build.internal]
lists = [1000]
spherical_centroids = true
$$);
Cosine distance
CREATE INDEX ON items USING vchordrq (embedding vector_cosine_ops) WITH (options = $$
residual_quantization = false
[build.internal]
lists = [1000]
spherical_centroids = true
$$);
NOTE
- Set
residual_quantization
tofalse
andspherical_centroids
totrue
for cosine similarity embeddings - Use
halfvec_cosine_ops
/halfvec_ip_ops
forhalfvec
To construct an index for vectors, first create a table named items
with a column named embedding
of type vector(n)
. Then, populate the table with generated data.
CREATE TABLE items (embedding vector(3));
INSERT INTO items (embedding) SELECT ARRAY[random(), random(), random()]::real[] FROM generate_series(1, 1000);
You can create a vector index for squared Euclidean distance with the following SQL.
CREATE INDEX ON items USING vchordrq (embedding vector_l2_ops) WITH (options = $$
residual_quantization = true
[build.internal]
lists = [4096]
spherical_centroids = false
$$);
The [build.internal]
section contains the following options:
lists
: The number of lists in the inverted file.spherical_centroids
: Whether to use spherical centroids.
The index will be built internally.