Python binding for pgvecto.rs
Project description
PGVecto.rs support for Python
PGVecto.rs Python library, supports Django, SQLAlchemy, and Psycopg 3.
Vector | Sparse Vector | Half-Precision Vector | Binary Vector | |
---|---|---|---|---|
SQLAlchemy | ✅Insert | ✅Insert | ✅Insert | ✅Insert |
Psycopg3 | ✅Insert ✅Copy | ✅Insert ✅Copy | ✅Insert ✅Copy | ✅Insert ✅Copy |
Django | ✅Insert | ✅Insert | ✅Insert | ✅Insert |
Usage
Install from PyPI:
pip install pgvecto_rs
And use it with your database library:
Or as a standalone SDK:
Requirements
To initialize a pgvecto.rs instance, you can run our official image by Quick start:
You can get the latest tags from the Release page. For example, it might be:
docker run \
--name pgvecto-rs-demo \
-e POSTGRES_PASSWORD=mysecretpassword \
-p 5432:5432 \
-d tensorchord/pgvecto-rs:pg16-v0.3.0
SQLAlchemy
Install dependencies:
pip install "pgvecto_rs[sqlalchemy]"
Initialize a connection
from sqlalchemy import create_engine
from sqlalchemy.orm import Session
URL = "postgresql://postgres:mysecretpassword@localhost:5432/postgres"
engine = create_engine(URL)
with Session(engine) as session:
pass
Enable the extension
from sqlalchemy import text
session.execute(text('CREATE EXTENSION IF NOT EXISTS vectors'))
Create a model
from pgvecto_rs.sqlalchemy import Vector
class Item(Base):
embedding = mapped_column(Vector(3))
All supported types are shown in this table
Native types | Types for SQLAlchemy | Correspond to pgvector-python |
---|---|---|
vector | VECTOR | VECTOR |
svector | SVECTOR | SPARSEVEC |
vecf16 | VECF16 | HALFVEC |
bvector | BVECTOR | BIT |
Insert a vector
from sqlalchemy import insert
stmt = insert(Item).values(embedding=[1, 2, 3])
session.execute(stmt)
session.commit()
Add an approximate index
from sqlalchemy import Index
from pgvecto_rs.types import IndexOption, Hnsw, Ivf
index = Index(
"emb_idx_1",
Item.embedding,
postgresql_using="vectors",
postgresql_with={
"options": f"$${IndexOption(index=Ivf(), threads=1).dumps()}$$"
},
postgresql_ops={"embedding": "vector_l2_ops"},
)
# or
index = Index(
"emb_idx_2",
Item.embedding,
postgresql_using="vectors",
postgresql_with={
"options": f"$${IndexOption(index=Hnsw()).dumps()}$$"
},
postgresql_ops={"embedding": "vector_l2_ops"},
)
# Apply changes
index.create(session.bind)
Get the nearest neighbors to a vector
from sqlalchemy import select
session.scalars(select(Item.embedding).order_by(Item.embedding.l2_distance(target.embedding)))
Also supports max_inner_product
, cosine_distance
and jaccard_distance(for BVECTOR)
Get items within a certain distance
session.scalars(select(Item).filter(Item.embedding.l2_distance([3, 1, 2]) < 5))
See examples/sqlalchemy_example.py and tests/test_sqlalchemy.py for more examples
Psycopg3
Install dependencies:
pip install "pgvecto_rs[psycopg3]"
Initialize a connection
import psycopg
URL = "postgresql://postgres:mysecretpassword@localhost:5432/postgres"
with psycopg.connect(URL) as conn:
pass
Enable the extension and register vector types
from pgvecto_rs.psycopg import register_vector
conn.execute('CREATE EXTENSION IF NOT EXISTS vectors')
register_vector(conn)
# or asynchronously
# await register_vector_async(conn)
Create a table
conn.execute('CREATE TABLE items (embedding vector(3))')
Insert or copy vectors into table
conn.execute('INSERT INTO items (embedding) VALUES (%s)', ([1, 2, 3],))
# or faster, copy it
with conn.cursor() as cursor, cursor.copy(
"COPY items (embedding) FROM STDIN (FORMAT BINARY)"
) as copy:
copy.write_row([np.array([1, 2, 3])])
Add an approximate index
from pgvecto_rs.types import IndexOption, Hnsw, Ivf
conn.execute(
"CREATE INDEX emb_idx_1 ON items USING \
vectors (embedding vector_l2_ops) WITH (options=$${}$$);".format(
IndexOption(index=Hnsw(), threads=1).dumps()
),
)
# or
conn.execute(
"CREATE INDEX emb_idx_2 ON items USING \
vectors (embedding vector_l2_ops) WITH (options=$${}$$);".format(
IndexOption(index=Ivf()).dumps()
),
)
# Apply all changes
conn.commit()
Get the nearest neighbors to a vector
conn.execute('SELECT * FROM items ORDER BY embedding <-> %s LIMIT 5', (embedding,)).fetchall()
Get the distance
conn.execute('SELECT embedding <-> %s FROM items \
ORDER BY embedding <-> %s', (embedding, embedding)).fetchall()
Get items within a certain distance
conn.execute('SELECT * FROM items WHERE embedding <-> %s < 1.0 \
ORDER BY embedding <-> %s', (embedding, embedding)).fetchall()
See examples/psycopg_example.py and tests/test_psycopg.py for more examples
Django
Install dependencies:
pip install "pgvecto_rs[django]"
Create a migration to enable the extension
from pgvecto_rs.django import VectorExtension
class Migration(migrations.Migration):
operations = [
VectorExtension()
]
Add a vector field to your model
from pgvecto_rs.django import VectorField
class Document(models.Model):
embedding = VectorField(dimensions=3)
All supported types are shown in this table
Native types | Types for Django | Correspond to pgvector-python |
---|---|---|
vector | VectorField | VectorField |
svector | SparseVectorField | SparseVectorField |
vecf16 | Float16VectorField | HalfVectorField |
bvector | BinaryVectorField | BitField |
Insert a vector
Item(embedding=[1, 2, 3]).save()
Add an approximate index
from django.db import models
from pgvecto_rs.django import HnswIndex, IvfIndex
from pgvecto_rs.types import IndexOption, Hnsw
class Item(models.Model):
class Meta:
indexes = [
HnswIndex(
name="emb_idx_1",
fields=["embedding"],
opclasses=["vector_l2_ops"],
m=16,
ef_construction=100,
threads=1,
)
# or
IvfIndex(
name="emb_idx_2",
fields=["embedding"],
nlist=3,
opclasses=["vector_l2_ops"],
),
]
Get the nearest neighbors to a vector
from pgvecto_rs.django import L2Distance
Item.objects.order_by(L2Distance('embedding', [3, 1, 2]))[:5]
Also supports MaxInnerProduct
, CosineDistance
and JaccardDistance(for BinaryVectorField)
Get the distance
Item.objects.annotate(distance=L2Distance('embedding', [3, 1, 2]))
Get items within a certain distance
Item.objects.alias(distance=L2Distance('embedding', [3, 1, 2])).filter(distance__lt=5)
See examples/django_example.py and tests/test_django.py for more examples.
SDK
Our SDK is designed to use the pgvecto.rs out-of-box. You can exploit the power of pgvecto.rs to do similarity search or retrieve with filters, without writing any SQL code.
Install dependencies:
pip install "pgvecto_rs[sdk]"
A minimal example:
from pgvecto_rs.sdk import PGVectoRs, Record
# Create a client
client = PGVectoRs(
db_url="postgresql+psycopg://postgres:mysecretpassword@localhost:5432/postgres",
table_name="example",
dimension=3,
)
try:
# Add some records
client.add_records(
[
Record.from_text("hello 1", [1, 2, 3]),
Record.from_text("hello 2", [1, 2, 4]),
]
)
# Search with default operator (sqrt_euclid).
# The results is sorted by distance
for rec, dis in client.search([1, 2, 5]):
print(rec.text)
print(dis)
finally:
# Clean up (i.e. drop the table)
client.drop()
Output:
hello 2
1.0
hello 1
4.0
See examples/sdk_example.py and tests/test_sdk.py for more examples.
Development
This package is managed by PDM.
Set up things:
pdm venv create
pdm use # select the venv inside the project path
pdm sync -d -G :all --no-isolation
# lock requirement
# need pdm >=2.17: https://pdm-project.org/latest/usage/lock-targets/#separate-lock-files-or-merge-into-one
pdm lock -d -G :all --python=">=3.9"
pdm lock -d -G :all --python="<3.9" --append
# install package to local
# `--no-isolation` is required for scipy
pdm install -d --no-isolation
Run lint:
pdm run format
pdm run fix
pdm run check
Run test in current environment:
pdm run test
Test
Tox is used to test the package locally.
Run test in all environment:
tox run
Acknowledgement
We would like to express our gratitude to the creators and contributors of the pgvector-python repository for their valuable code and architecture, which greatly influenced the development of this repository.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pgvecto_rs-0.2.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b3ee2c465219469ad537b3efea2916477c6c576b3d6fd4298980d0733d12bb27 |
|
MD5 | 697dc9667f99d56bbf3ee4515a6911d0 |
|
BLAKE2b-256 | cafce26b07e54ae51e7d490b22dcfa5d7e30a8ade22c2e4231063aaec74c0099 |