Python binding for pgvecto.rs

These details have not been verified by PyPI

Project description

PGVecto.rs support for Python

PGVecto.rs Python library, supports Django, SQLAlchemy, and Psycopg 3.

	Vector	Sparse Vector	Half-Precision Vector	Binary Vector
SQLAlchemy	✅Insert	✅Insert	✅Insert	✅Insert
Psycopg3	✅Insert ✅Copy	✅Insert ✅Copy	✅Insert ✅Copy	✅Insert ✅Copy
Django	✅Insert	✅Insert	✅Insert	✅Insert

Usage

Install from PyPI:

pip install pgvecto_rs

And use it with your database library:

SQLAlchemy
Psycopg3
Django

Or as a standalone SDK:

usage of SDK

Requirements

To initialize a pgvecto.rs instance, you can run our official image by Quick start:

You can get the latest tags from the Release page. For example, it might be:

docker run \
  --name pgvecto-rs-demo \
  -e POSTGRES_PASSWORD=mysecretpassword \
  -p 5432:5432 \
  -d tensorchord/pgvecto-rs:pg16-v0.3.0

SQLAlchemy

Install dependencies:

pip install "pgvecto_rs[sqlalchemy]"

Initialize a connection

from sqlalchemy import create_engine
from sqlalchemy.orm import Session

URL = "postgresql://postgres:mysecretpassword@localhost:5432/postgres"
engine = create_engine(URL)
with Session(engine) as session:
    pass

Enable the extension

from sqlalchemy import text

session.execute(text('CREATE EXTENSION IF NOT EXISTS vectors'))

Create a model

from pgvecto_rs.sqlalchemy import Vector

class Item(Base):
    embedding = mapped_column(Vector(3))

All supported types are shown in this table

Native types	Types for SQLAlchemy	Correspond to pgvector-python
vector	VECTOR	VECTOR
svector	SVECTOR	SPARSEVEC
vecf16	VECF16	HALFVEC
bvector	BVECTOR	BIT

Insert a vector

from sqlalchemy import insert

stmt = insert(Item).values(embedding=[1, 2, 3])
session.execute(stmt)
session.commit()

Add an approximate index

from sqlalchemy import Index
from pgvecto_rs.types import IndexOption, Hnsw, Ivf

index = Index(
    "emb_idx_1",
    Item.embedding,
    postgresql_using="vectors",
    postgresql_with={
        "options": f"$${IndexOption(index=Ivf(), threads=1).dumps()}$$"
    },
    postgresql_ops={"embedding": "vector_l2_ops"},
)
# or
index = Index(
    "emb_idx_2",
    Item.embedding,
    postgresql_using="vectors",
    postgresql_with={
        "options": f"$${IndexOption(index=Hnsw()).dumps()}$$"
    },
    postgresql_ops={"embedding": "vector_l2_ops"},
)
# Apply changes
index.create(session.bind)

Get the nearest neighbors to a vector

from sqlalchemy import select

session.scalars(select(Item.embedding).order_by(Item.embedding.l2_distance(target.embedding)))

Also supports max_inner_product, cosine_distance and jaccard_distance(for BVECTOR)

Get items within a certain distance

session.scalars(select(Item).filter(Item.embedding.l2_distance([3, 1, 2]) < 5))

See examples/sqlalchemy_example.py and tests/test_sqlalchemy.py for more examples

Psycopg3

Install dependencies:

pip install "pgvecto_rs[psycopg3]"

Initialize a connection

import psycopg

URL = "postgresql://postgres:mysecretpassword@localhost:5432/postgres"
with psycopg.connect(URL) as conn:
    pass

Enable the extension and register vector types

from pgvecto_rs.psycopg import register_vector

conn.execute('CREATE EXTENSION IF NOT EXISTS vectors')
register_vector(conn)
# or asynchronously
# await register_vector_async(conn)

Create a table

conn.execute('CREATE TABLE items (embedding vector(3))')

Insert or copy vectors into table

conn.execute('INSERT INTO items (embedding) VALUES (%s)', ([1, 2, 3],))
# or faster, copy it
with conn.cursor() as cursor, cursor.copy(
    "COPY items (embedding) FROM STDIN (FORMAT BINARY)"
) as copy:
    copy.write_row([np.array([1, 2, 3])])

Add an approximate index

from pgvecto_rs.types import IndexOption, Hnsw, Ivf

conn.execute(
    "CREATE INDEX emb_idx_1 ON items USING \
        vectors (embedding vector_l2_ops) WITH (options=$${}$$);".format(
        IndexOption(index=Hnsw(), threads=1).dumps()
    ),
)
# or
conn.execute(
    "CREATE INDEX emb_idx_2 ON items USING \
        vectors (embedding vector_l2_ops) WITH (options=$${}$$);".format(
        IndexOption(index=Ivf()).dumps()
    ),
)
# Apply all changes
conn.commit()

Get the nearest neighbors to a vector

conn.execute('SELECT * FROM items ORDER BY embedding <-> %s LIMIT 5', (embedding,)).fetchall()

Get the distance

conn.execute('SELECT embedding <-> %s FROM items \
    ORDER BY embedding <-> %s', (embedding, embedding)).fetchall()

Get items within a certain distance

conn.execute('SELECT * FROM items WHERE embedding <-> %s < 1.0 \
    ORDER BY embedding <-> %s', (embedding, embedding)).fetchall()

See examples/psycopg_example.py and tests/test_psycopg.py for more examples

Django

Install dependencies:

pip install "pgvecto_rs[django]"

Create a migration to enable the extension

from pgvecto_rs.django import VectorExtension

class Migration(migrations.Migration):
    operations = [
        VectorExtension()
    ]

Add a vector field to your model

from pgvecto_rs.django import VectorField

class Document(models.Model):
    embedding = VectorField(dimensions=3)

All supported types are shown in this table

Native types	Types for Django	Correspond to pgvector-python
vector	VectorField	VectorField
svector	SparseVectorField	SparseVectorField
vecf16	Float16VectorField	HalfVectorField
bvector	BinaryVectorField	BitField

Insert a vector

Item(embedding=[1, 2, 3]).save()

Add an approximate index

from django.db import models
from pgvecto_rs.django import HnswIndex, IvfIndex
from pgvecto_rs.types import IndexOption, Hnsw


class Item(models.Model):
    class Meta:
        indexes = [
            HnswIndex(
                name="emb_idx_1",
                fields=["embedding"],
                opclasses=["vector_l2_ops"],
                m=16,
                ef_construction=100,
                threads=1,
            )
            # or
            IvfIndex(
                name="emb_idx_2",
                fields=["embedding"],
                nlist=3,
                opclasses=["vector_l2_ops"],
            ),
        ]

Get the nearest neighbors to a vector

from pgvecto_rs.django import L2Distance

Item.objects.order_by(L2Distance('embedding', [3, 1, 2]))[:5]

Also supports MaxInnerProduct, CosineDistance and JaccardDistance(for BinaryVectorField)

Get the distance

Item.objects.annotate(distance=L2Distance('embedding', [3, 1, 2]))

Get items within a certain distance

Item.objects.alias(distance=L2Distance('embedding', [3, 1, 2])).filter(distance__lt=5)

See examples/django_example.py and tests/test_django.py for more examples.

SDK

Our SDK is designed to use the pgvecto.rs out-of-box. You can exploit the power of pgvecto.rs to do similarity search or retrieve with filters, without writing any SQL code.

Install dependencies:

pip install "pgvecto_rs[sdk]"

A minimal example:

from pgvecto_rs.sdk import PGVectoRs, Record

# Create a client
client = PGVectoRs(
    db_url="postgresql+psycopg://postgres:mysecretpassword@localhost:5432/postgres",
    table_name="example",
    dimension=3,
)

try:
    # Add some records
    client.add_records(
        [
            Record.from_text("hello 1", [1, 2, 3]),
            Record.from_text("hello 2", [1, 2, 4]),
        ]
    )

    # Search with default operator (sqrt_euclid).
    # The results is sorted by distance
    for rec, dis in client.search([1, 2, 5]):
        print(rec.text)
        print(dis)
finally:
    # Clean up (i.e. drop the table)
    client.drop()

Output:

hello 2
1.0
hello 1
4.0

See examples/sdk_example.py and tests/test_sdk.py for more examples.

Development

This package is managed by PDM.

Set up things:

pdm venv create
pdm use # select the venv inside the project path
pdm sync -d -G :all --no-isolation

# lock requirement
# need pdm >=2.17: https://pdm-project.org/latest/usage/lock-targets/#separate-lock-files-or-merge-into-one
pdm lock -d -G :all --python=">=3.9"
pdm lock -d -G :all --python="<3.9" --append
# install package to local
# `--no-isolation` is required for scipy
pdm install -d --no-isolation

Run lint:

pdm run format
pdm run fix
pdm run check

Run test in current environment:

pdm run test

Test

Tox is used to test the package locally.

Run test in all environment:

tox run

Acknowledgement

We would like to express our gratitude to the creators and contributors of the pgvector-python repository for their valuable code and architecture, which greatly influenced the development of this repository.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.2

Oct 8, 2024

This version

0.2.1

Jul 29, 2024

0.2.0

Jul 29, 2024

0.1.5

Jun 4, 2024

0.1.4

Nov 23, 2023

0.1.3

Nov 16, 2023

0.1.2

Oct 31, 2023

0.1.1

Oct 25, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pgvecto_rs-0.2.1.tar.gz (27.4 kB view hashes)

Uploaded Jul 29, 2024 Source

Built Distribution

pgvecto_rs-0.2.1-py3-none-any.whl (29.8 kB view hashes)

Uploaded Jul 29, 2024 Python 3

Hashes for pgvecto_rs-0.2.1.tar.gz

Hashes for pgvecto_rs-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`07046eaad2c4f75745f76de9ba483541909f1c595aced8d3434224a4f933daca`
MD5	`9cef20900b2c4f17258dd17a66fe1b85`
BLAKE2b-256	`64e3baeba765a27e7d0ef7421a9114da0e95724eef03499e13a17aa45fccd60a`

Hashes for pgvecto_rs-0.2.1-py3-none-any.whl

Hashes for pgvecto_rs-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b3ee2c465219469ad537b3efea2916477c6c576b3d6fd4298980d0733d12bb27`
MD5	`697dc9667f99d56bbf3ee4515a6911d0`
BLAKE2b-256	`cafce26b07e54ae51e7d490b22dcfa5d7e30a8ade22c2e4231063aaec74c0099`