Skip to main content

Scalable Nearest Neighbor search library

Project description

ScaNN

ScaNN (Scalable Nearest Neighbors) is a method for efficient vector similarity search at scale. This code release implements [1], which includes search space pruning and quantization for Maximum Inner Product Search and also supports other distance functions such as Euclidean distance. The implementation is designed for x86 processors with AVX2 support. ScaNN achieves state-of-the-art performance on ann-benchmarks.com as shown on the glove-100-angular dataset below:

glove-100-angular

ScaNN can be configured to fit datasets with different sizes and distributions. It has both TensorFlow and Python APIs. The library shows strong performance with large datasets [1]. The code is released for research purposes. For more details on the academic description of algorithms, please see [1].

Reference [1]:

@inproceedings{avq_2020,
  title={Accelerating Large-Scale Inference with Anisotropic Vector Quantization},
  author={Guo, Ruiqi and Sun, Philip and Lindgren, Erik and Geng, Quan and Simcha, David and Chern, Felix and Kumar, Sanjiv},
  booktitle={International Conference on Machine Learning},
  year={2020},
  URL={https://arxiv.org/abs/1908.10396}
}

Installation

manylinux_2_27-compatible wheels are available on PyPI:

pip install scann

ScaNN supports Linux environments running Python versions 3.9-3.12. See docs/releases.md for release notes; the page also contains download links for ScaNN wheels prior to version 1.1.0, which were not released on PyPI.

In accordance with the manylinux_2_27 specification, ScaNN requires libstdc++ version 3.4.23 or above from the operating system. See here for an example of how to find your system's libstdc++ version; it can generally be upgraded by installing a newer version of g++.

Integration with TensorFlow Serving

We provide custom Docker images of TF Serving that are linked to the ScaNN TF ops. See the tf_serving directory for further information.

Building from source

To build ScaNN from source, first install the build tool bazel, Clang 16, and libstdc++ headers for C++17 (which are provided with GCC 9). Additionally, ScaNN requires a modern version of Python (3.9.x or later) and Tensorflow 2.16 installed on that version of Python. Once these prerequisites are satisfied, run the following command in the root directory of the repository:

python configure.py
CC=clang-16 bazel build -c opt --features=thin_lto --copt=-mavx --copt=-mfma --cxxopt="-std=c++17" --copt=-fsized-deallocation --copt=-w :build_pip_pkg
./bazel-bin/build_pip_pkg

A .whl file should appear in the root of the repository upon successful completion of these commands. This .whl can be installed via pip.

Usage

See the example in docs/example.ipynb. For a more in-depth explanation of ScaNN techniques, see docs/algorithms.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

scann-1.3.1-cp312-cp312-manylinux_2_27_x86_64.whl (10.6 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.27+ x86-64

scann-1.3.1-cp311-cp311-manylinux_2_27_x86_64.whl (10.6 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.27+ x86-64

scann-1.3.1-cp310-cp310-manylinux_2_27_x86_64.whl (10.6 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.27+ x86-64

scann-1.3.1-cp39-cp39-manylinux_2_27_x86_64.whl (10.6 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.27+ x86-64

File details

Details for the file scann-1.3.1-cp312-cp312-manylinux_2_27_x86_64.whl.

File metadata

File hashes

Hashes for scann-1.3.1-cp312-cp312-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 96d69523d4948ad22ed655ca47a0e7f81532bbe029c205441da6b18b294b59fa
MD5 40455e868cc58762bd278cc414bc732f
BLAKE2b-256 c4a2bf60731231b9dd790af31019ffb263a1066c47736536e5acf47fb2f4e5b9

See more details on using hashes here.

Provenance

File details

Details for the file scann-1.3.1-cp311-cp311-manylinux_2_27_x86_64.whl.

File metadata

File hashes

Hashes for scann-1.3.1-cp311-cp311-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 d0e5f269e6a07b3b5c89fe3fe9b4a1c45f9b6f9006699d133506689953697e32
MD5 ed11b07a93e8351a90869c7b83004b39
BLAKE2b-256 943e649ed847e5617fe3deae5721aa8b3f1a24189251aa5a5c4760ece84a2075

See more details on using hashes here.

Provenance

File details

Details for the file scann-1.3.1-cp310-cp310-manylinux_2_27_x86_64.whl.

File metadata

File hashes

Hashes for scann-1.3.1-cp310-cp310-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 3e317c5014d57e0e645acb099184803683d5adb7c972708171822f8faa2221d5
MD5 6305ae9b479a55d00ed7f04b795b754e
BLAKE2b-256 d68280d9571984d0f7205b63d531d18420956663e8ffbf23e476d6c122f23c24

See more details on using hashes here.

Provenance

File details

Details for the file scann-1.3.1-cp39-cp39-manylinux_2_27_x86_64.whl.

File metadata

File hashes

Hashes for scann-1.3.1-cp39-cp39-manylinux_2_27_x86_64.whl
Algorithm Hash digest
SHA256 63c4c8514401617e61f463a9f7fe5b3238bb4046ea79f779f13a0a39f0db7244
MD5 23c343588ca89ec8df5443f8f0a3b819
BLAKE2b-256 8bd10fe750e5ee1175649106904ba0da35242161cbf87ccda01082eba5b61a32

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page