Skip to main content

Python binding for pgvecto.rs

Project description

PGVecto.rs support for Python

discord invitation link trackgit-views trackgit-views trackgit-views

PGVecto.rs Python library, supports Django, SQLAlchemy, and Psycopg 3.

Vector Sparse Vector Half-Precision Vector Binary Vector
SQLAlchemy ✅Insert ✅Insert ✅Insert ✅Insert
Psycopg3 ✅Insert ✅Copy ✅Insert ✅Copy ✅Insert ✅Copy ✅Insert ✅Copy
Django ✅Insert ✅Insert ✅Insert ✅Insert

Usage

Install from PyPI:

pip install pgvecto_rs

And use it with your database library:

Or as a standalone SDK:

Requirements

To initialize a pgvecto.rs instance, you can run our official image by Quick start:

You can get the latest tags from the Release page. For example, it might be:

docker run \
  --name pgvecto-rs-demo \
  -e POSTGRES_PASSWORD=mysecretpassword \
  -p 5432:5432 \
  -d tensorchord/pgvecto-rs:pg16-v0.3.0

SQLAlchemy

Install dependencies:

pip install "pgvecto_rs[sqlalchemy]"

Initialize a connection

from sqlalchemy import create_engine
from sqlalchemy.orm import Session

URL = "postgresql://postgres:mysecretpassword@localhost:5432/postgres"
engine = create_engine(URL)
with Session(engine) as session:
    pass

Enable the extension

from sqlalchemy import text

session.execute(text('CREATE EXTENSION IF NOT EXISTS vectors'))

Create a model

from pgvecto_rs.sqlalchemy import Vector

class Item(Base):
    embedding = mapped_column(Vector(3))

All supported types are shown in this table

Native types Types for SQLAlchemy Correspond to pgvector-python
vector VECTOR VECTOR
svector SVECTOR SPARSEVEC
vecf16 VECF16 HALFVEC
bvector BVECTOR BIT

Insert a vector

from sqlalchemy import insert

stmt = insert(Item).values(embedding=[1, 2, 3])
session.execute(stmt)
session.commit()

Add an approximate index

from sqlalchemy import Index
from pgvecto_rs.types import IndexOption, Hnsw, Ivf

index = Index(
    "emb_idx_1",
    Item.embedding,
    postgresql_using="vectors",
    postgresql_with={
        "options": f"$${IndexOption(index=Ivf(), threads=1).dumps()}$$"
    },
    postgresql_ops={"embedding": "vector_l2_ops"},
)
# or
index = Index(
    "emb_idx_2",
    Item.embedding,
    postgresql_using="vectors",
    postgresql_with={
        "options": f"$${IndexOption(index=Hnsw()).dumps()}$$"
    },
    postgresql_ops={"embedding": "vector_l2_ops"},
)
# Apply changes
index.create(session.bind)

Get the nearest neighbors to a vector

from sqlalchemy import select

session.scalars(select(Item.embedding).order_by(Item.embedding.l2_distance(target.embedding)))

Also supports max_inner_product, cosine_distance and jaccard_distance(for BVECTOR)

Get items within a certain distance

session.scalars(select(Item).filter(Item.embedding.l2_distance([3, 1, 2]) < 5))

See examples/sqlalchemy_example.py and tests/test_sqlalchemy.py for more examples

Psycopg3

Install dependencies:

pip install "pgvecto_rs[psycopg3]"

Initialize a connection

import psycopg

URL = "postgresql://postgres:mysecretpassword@localhost:5432/postgres"
with psycopg.connect(URL) as conn:
    pass

Enable the extension and register vector types

from pgvecto_rs.psycopg import register_vector

conn.execute('CREATE EXTENSION IF NOT EXISTS vectors')
register_vector(conn)
# or asynchronously
# await register_vector_async(conn)

Create a table

conn.execute('CREATE TABLE items (embedding vector(3))')

Insert or copy vectors into table

conn.execute('INSERT INTO items (embedding) VALUES (%s)', ([1, 2, 3],))
# or faster, copy it
with conn.cursor() as cursor, cursor.copy(
    "COPY items (embedding) FROM STDIN (FORMAT BINARY)"
) as copy:
    copy.write_row([np.array([1, 2, 3])])

Add an approximate index

from pgvecto_rs.types import IndexOption, Hnsw, Ivf

conn.execute(
    "CREATE INDEX emb_idx_1 ON items USING \
        vectors (embedding vector_l2_ops) WITH (options=$${}$$);".format(
        IndexOption(index=Hnsw(), threads=1).dumps()
    ),
)
# or
conn.execute(
    "CREATE INDEX emb_idx_2 ON items USING \
        vectors (embedding vector_l2_ops) WITH (options=$${}$$);".format(
        IndexOption(index=Ivf()).dumps()
    ),
)
# Apply all changes
conn.commit()

Get the nearest neighbors to a vector

conn.execute('SELECT * FROM items ORDER BY embedding <-> %s LIMIT 5', (embedding,)).fetchall()

Get the distance

conn.execute('SELECT embedding <-> %s FROM items \
    ORDER BY embedding <-> %s', (embedding, embedding)).fetchall()

Get items within a certain distance

conn.execute('SELECT * FROM items WHERE embedding <-> %s < 1.0 \
    ORDER BY embedding <-> %s', (embedding, embedding)).fetchall()

See examples/psycopg_example.py and tests/test_psycopg.py for more examples

Django

Install dependencies:

pip install "pgvecto_rs[django]"

Create a migration to enable the extension

from pgvecto_rs.django import VectorExtension

class Migration(migrations.Migration):
    operations = [
        VectorExtension()
    ]

Add a vector field to your model

from pgvecto_rs.django import VectorField

class Document(models.Model):
    embedding = VectorField(dimensions=3)

All supported types are shown in this table

Native types Types for Django Correspond to pgvector-python
vector VectorField VectorField
svector SparseVectorField SparseVectorField
vecf16 Float16VectorField HalfVectorField
bvector BinaryVectorField BitField

Insert a vector

Item(embedding=[1, 2, 3]).save()

Add an approximate index

from django.db import models
from pgvecto_rs.django import HnswIndex, IvfIndex
from pgvecto_rs.types import IndexOption, Hnsw


class Item(models.Model):
    class Meta:
        indexes = [
            HnswIndex(
                name="emb_idx_1",
                fields=["embedding"],
                opclasses=["vector_l2_ops"],
                m=16,
                ef_construction=100,
                threads=1,
            )
            # or
            IvfIndex(
                name="emb_idx_2",
                fields=["embedding"],
                nlist=3,
                opclasses=["vector_l2_ops"],
            ),
        ]

Get the nearest neighbors to a vector

from pgvecto_rs.django import L2Distance

Item.objects.order_by(L2Distance('embedding', [3, 1, 2]))[:5]

Also supports MaxInnerProduct, CosineDistance and JaccardDistance(for BinaryVectorField)

Get the distance

Item.objects.annotate(distance=L2Distance('embedding', [3, 1, 2]))

Get items within a certain distance

Item.objects.alias(distance=L2Distance('embedding', [3, 1, 2])).filter(distance__lt=5)

See examples/django_example.py and tests/test_django.py for more examples.

SDK

Our SDK is designed to use the pgvecto.rs out-of-box. You can exploit the power of pgvecto.rs to do similarity search or retrieve with filters, without writing any SQL code.

Install dependencies:

pip install "pgvecto_rs[sdk]"

A minimal example:

from pgvecto_rs.sdk import PGVectoRs, Record

# Create a client
client = PGVectoRs(
    db_url="postgresql+psycopg://postgres:mysecretpassword@localhost:5432/postgres",
    table_name="example",
    dimension=3,
)

try:
    # Add some records
    client.add_records(
        [
            Record.from_text("hello 1", [1, 2, 3]),
            Record.from_text("hello 2", [1, 2, 4]),
        ]
    )

    # Search with default operator (sqrt_euclid).
    # The results is sorted by distance
    for rec, dis in client.search([1, 2, 5]):
        print(rec.text)
        print(dis)
finally:
    # Clean up (i.e. drop the table)
    client.drop()

Output:

hello 2
1.0
hello 1
4.0

See examples/sdk_example.py and tests/test_sdk.py for more examples.

Development

This package is managed by PDM.

Set up things:

pdm venv create
pdm use # select the venv inside the project path
pdm sync -d -G :all --no-isolation

# lock requirement
# need pdm >=2.17: https://pdm-project.org/latest/usage/lock-targets/#separate-lock-files-or-merge-into-one
pdm lock -d -G :all --python=">=3.9"
pdm lock -d -G :all --python="<3.9" --append
# install package to local
# `--no-isolation` is required for scipy
pdm install -d --no-isolation

Run lint:

pdm run format
pdm run fix
pdm run check

Run test in current environment:

pdm run test

Test

Tox is used to test the package locally.

Run test in all environment:

tox run

Acknowledgement

We would like to express our gratitude to the creators and contributors of the pgvector-python repository for their valuable code and architecture, which greatly influenced the development of this repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pgvecto_rs-0.2.0.tar.gz (27.4 kB view hashes)

Uploaded Source

Built Distribution

pgvecto_rs-0.2.0-py3-none-any.whl (29.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page