Skip to main content

ColBERT Live! implements efficient ColBERT and ColPaLi search on top of vector indexes that support live updates (without rebuilding the entire index)

Project description

ColBERT Live!

ColBERT Live! implements efficient ColBERT search on top of vector indexes that support live updates (without rebuilding the entire index) as well as arbitrary predicates against other indexed fields.

Background

ColBERT (Contextualized Late Interaction over BERT) is a state-of-the-art semantic search model that combines the effectiveness of BERT-based language models with the performance required for practical, large-scale search applications.

Compared to traditional dense passage retrieval (i.e. vector-per-passage) ColBERT is particularly strong at handling unusual terms and short queries.

It is reasonable to think of ColBERT as combining the best of semantic vector search with traditional keyword search a la BM25, but without having to tune the weighting of hybrid search or dealing with corner cases where the vector and keyword sides play poorly together.

However, the initial ColBERT implementation is designed around a custom index that cannot be updated incrementally, and can only be combined with other indexes with difficulty. Adding, modifying, or removing documents from the custom index requires reindexing the entire collection, which can be prohibitively slow for large datasets.

ColBERT Live!

ColBERT Live! implements ColBERT on any vector database. This means you can add, modify, or remove documents from your search system without the need for costly reindexing of the entire collection, making it ideal for dynamic content environments. It also means that you can easily apply other predicates such as access controls or metadata filters from your database to your vector searches. ColBERT Live! features

  • Efficient ColBERT search implementation
  • Support for live updates to the vector index
  • Abstraction layer for database backends, starting with AstraDB and SQLite
  • State of the art ColBERT techniques including:
    • Answer.AI ColBERT model for higher relevance
    • Document embedding pooling for reduced storage requirements
    • Query embedding pooling for improved search performance

Installation

You can install ColBERT Live! using pip:

pip install colbert-live

Usage

  • Subclass your database backend and implement the required methods for retrieving embeddings:

    from colbert_live.db.astra import AstraCQL
    # or
    from colbert_live.db.sqlite import Sqlite3DB
    
    class MyDB(AstraCQL):
      ...
    
    db = MyDB()
    
  • Instantiate:

    model = colbert_live.models.ColbertModel() 
    # or
    model = colbert_live.models.ColpaliModel()
    
  • Initialize the ColbertLive instance:

    colbert = ColbertLive(db, model)
    
  • Call search:

    colbert.search(query_str, top_k)
    

Two cheat sheets are available:

Supported databases

ColBERT Live! initially supports DataStax Astra and SQLiteout of the box. Adding support for other databases is straightforward; check out the Astra implementation for an example to follow. If you're not concerned about making it reusable, you just have to implement the two methods of the base DB class.

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

colbert_live-0.9.0.tar.gz (19.7 kB view details)

Uploaded Source

Built Distribution

colbert_live-0.9.0-py3-none-any.whl (20.8 kB view details)

Uploaded Python 3

File details

Details for the file colbert_live-0.9.0.tar.gz.

File metadata

  • Download URL: colbert_live-0.9.0.tar.gz
  • Upload date:
  • Size: 19.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.15

File hashes

Hashes for colbert_live-0.9.0.tar.gz
Algorithm Hash digest
SHA256 a4767f98843cbe8af892769c95c47cb1db0e928c8735d69de02efdeba1c49a03
MD5 c08a8db61218b844c021178835304a1f
BLAKE2b-256 7620dc9883fbb5baebc405f8c926c8477a99dc117e60e7bcf9c2b7d2f506ceb2

See more details on using hashes here.

File details

Details for the file colbert_live-0.9.0-py3-none-any.whl.

File metadata

  • Download URL: colbert_live-0.9.0-py3-none-any.whl
  • Upload date:
  • Size: 20.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.15

File hashes

Hashes for colbert_live-0.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 aafa651767d5c72a290df2aaeadb8ae883e754686fe73eb76f8d82620e19a9f7
MD5 96290a1d6557602f6ae1193b44745086
BLAKE2b-256 8cf7fea3f92bae2808a2d6b318f7195f28fee486420fe87803043c793f0433e1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page