Skip to main content

Scalable Approximate Nearest Neighbor search library

Project description

ScaNN

ScaNN (Scalable Nearest Neighbors) is a method for efficient vector similarity search at scale. This code release implements [1], which includes search space pruning and quantization for Maximum Inner Product Search and also supports other distance functions such as Euclidean distance. The implementation is designed for x86 processors with AVX2 support. ScaNN achieves state-of-the-art performance on ann-benchmarks.com as shown on the glove-100-angular dataset below:

glove-100-angular

ScaNN can be configured to fit datasets with different sizes and distributions. It has both TensorFlow and Python APIs. The library shows strong performance with large datasets [1]. The code is released for research purposes. For more details on the academic description of algorithms, please see [1].

Reference [1]:

@inproceedings{avq_2020,
  title={Accelerating Large-Scale Inference with Anisotropic Vector Quantization},
  author={Guo, Ruiqi and Sun, Philip and Lindgren, Erik and Geng, Quan and Simcha, David and Chern, Felix and Kumar, Sanjiv},
  booktitle={International Conference on Machine Learning},
  year={2020},
  URL={https://arxiv.org/abs/1908.10396}
}

Installation

manylinux2014-compatible wheels are available on PyPI:

pip install scann

ScaNN supports Linux environments running Python versions 3.6-3.9. See docs/releases.md for release notes; the page also contains download links for ScaNN wheels prior to version 1.1.0, which were not released on PyPI.

In accordance with the manylinux2014 specification, ScaNN requires libstdc++ version 3.4.19 or above from the operating system. See here for an example of how to find your system's libstdc++ version; it can generally be upgraded by installing a newer version of g++.

Integration with TensorFlow Serving

We provide custom Docker images of TF Serving that are linked to the ScaNN TF ops. See the tf_serving directory for further information.

Building from source

To build ScaNN from source, first install the build tool bazel, Clang 8, and libstdc++ headers for C++17 (which are provided with GCC 9). Additionally, ScaNN requires a modern version of Python (3.6.x or later) and Tensorflow 2.6 installed on that version of Python. Once these prerequisites are satisfied, run the following command in the root directory of the repository:

python configure.py
CC=clang-8 bazel build -c opt --features=thin_lto --copt=-mavx2 --copt=-mfma --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" --cxxopt="-std=c++17" --copt=-fsized-deallocation --copt=-w :build_pip_pkg
./bazel-bin/build_pip_pkg

A .whl file should appear in the root of the repository upon successful completion of these commands. This .whl can be installed via pip.

Usage

See the example in docs/example.ipynb. For a more in-depth explanation of ScaNN techniques, see docs/algorithms.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

scann-1.2.3-cp39-cp39-manylinux2014_x86_64.whl (10.6 MB view details)

Uploaded CPython 3.9

scann-1.2.3-cp38-cp38-manylinux2014_x86_64.whl (10.6 MB view details)

Uploaded CPython 3.8

scann-1.2.3-cp37-cp37m-manylinux2014_x86_64.whl (10.6 MB view details)

Uploaded CPython 3.7m

scann-1.2.3-cp36-cp36m-manylinux2014_x86_64.whl (10.6 MB view details)

Uploaded CPython 3.6m

File details

Details for the file scann-1.2.3-cp39-cp39-manylinux2014_x86_64.whl.

File metadata

  • Download URL: scann-1.2.3-cp39-cp39-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 10.6 MB
  • Tags: CPython 3.9
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.6.10

File hashes

Hashes for scann-1.2.3-cp39-cp39-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 bf0910f9d12e87c040af3e623bab279dc4dc67197eb02afd734da49f1a6d3e19
MD5 7466849addd1624d5146f329b1da31f6
BLAKE2b-256 da9f0a57177a4f78c18cabe69d5063bbf1801f9965637282f4c21ace003712bf

See more details on using hashes here.

Provenance

File details

Details for the file scann-1.2.3-cp38-cp38-manylinux2014_x86_64.whl.

File metadata

  • Download URL: scann-1.2.3-cp38-cp38-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 10.6 MB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.6.10

File hashes

Hashes for scann-1.2.3-cp38-cp38-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7bd359f2add47fff4a44b6b8b6c624ea918dd08735526251e8bfe11c579ce1a9
MD5 f7558ab52835e92060773c4ceb9ad942
BLAKE2b-256 ffddd066d5948df51d15f4c2c522b84f1471d372e150552c74b72714153ae29b

See more details on using hashes here.

Provenance

File details

Details for the file scann-1.2.3-cp37-cp37m-manylinux2014_x86_64.whl.

File metadata

  • Download URL: scann-1.2.3-cp37-cp37m-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 10.6 MB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.6.10

File hashes

Hashes for scann-1.2.3-cp37-cp37m-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 dcf42abb612855a3030b4b88aa403d44ccbc286142f7d9e15e43ff9f54a30a9f
MD5 a23f8aa2252070258e00aa498e730863
BLAKE2b-256 531eedf54df4ed902871db837fb790ac7c849de1fcdada2af713121bf18db55a

See more details on using hashes here.

Provenance

File details

Details for the file scann-1.2.3-cp36-cp36m-manylinux2014_x86_64.whl.

File metadata

  • Download URL: scann-1.2.3-cp36-cp36m-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 10.6 MB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.6.10

File hashes

Hashes for scann-1.2.3-cp36-cp36m-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e1e7b6651fa16e57d5a6c3cd4c8c66874b0d55eb7b22c3a2a2e69aac8a0d7c74
MD5 683174f50aaee5838ddf4cabbcee05db
BLAKE2b-256 7598676d99bbe747d6dc6906651b5d705c0eb5984c1af22bc2c2156a3bfd68bd

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page