Skip to main content

No project description provided

Project description

naan

PyPI - Version PyPI - Python Version


Table of Contents

What is Naan?

  • Naan is a wrapper around FAISS indexes that provides metadata storage and retrieval for the vectors added to the index.
  • Naan's job is to eliminate the tedious task of keeping around the original content before it's encoded and added to the index.
  • Naan is NOT a vector database. All the vector-search operations are demanded to FAISS.

Installation

pip install naan

Index data

To see Naan in action, let's first get some data to embed:

from io import StringIO
import requests
import json


res = requests.get("https://raw.githubusercontent.com/masci/naan/main/example/sentences.json")
sentences = json.load(StringIO(res.text))

Naan tries not to get in the way you manage your FAISS index, so the first step is always setting up the FAISS side of things:

from sentence_transformers import SentenceTransformer
import faiss


model = SentenceTransformer("bert-base-nli-mean-tokens")
sentence_embeddings = model.encode(sentences)
dim = sentence_embeddings.shape[1]
index = faiss.IndexFlatL2(dim)

Now it's time to wrap the FAISS index with Naan and use it to index data:

from naan import NaanDB


# Create a Naan database from scratch
db = NaanDB("db.naan", index, force_recreate=True)
db.add(sentence_embeddings, sentences)

Naan will add the vector embeddings to the FAISS index, and will also store the original sentences. This way, a vector search will look like this:

# Reopen an existing Naan database
db = NaanDB("db.naan")
query_embeddings = model.encode(["The book is on the table"])
# Naan's search API is the same as FAISS, let's get the 3 closest vectors
results = db.search(query_embeddings, 3)
for result in results:
    print(result)
# Document(vector_id=11451, content='A group of people sitting around a desk.', embeddings=None)
# Document(vector_id=2754, content='A close-up picture of a desk with a computer and papers on it.', embeddings=None)
# Document(vector_id=11853, content='A computer on a desk.', embeddings=None)

License

naan is distributed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

naan-0.0.4.tar.gz (6.4 kB view details)

Uploaded Source

Built Distribution

naan-0.0.4-py3-none-any.whl (6.5 kB view details)

Uploaded Python 3

File details

Details for the file naan-0.0.4.tar.gz.

File metadata

  • Download URL: naan-0.0.4.tar.gz
  • Upload date:
  • Size: 6.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.27.0

File hashes

Hashes for naan-0.0.4.tar.gz
Algorithm Hash digest
SHA256 8e8d4b32e1c96ab6dea15e86df891f6a56f721e17085e5020ac6bb123b92b984
MD5 3018c3aaf27b99c03e7c80b105af5895
BLAKE2b-256 3153d2182f75c917b40fe8679f2ff9f847f5a2400bce534b4546ed7953737397

See more details on using hashes here.

File details

Details for the file naan-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: naan-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 6.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.27.0

File hashes

Hashes for naan-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 cedc458dfc35a5d3cf9c1e53751698858a829d9407a19e9493b9e09d6a66a5bf
MD5 ddd61c1950de5cff25b8bcd8bf5ac993
BLAKE2b-256 8618ba235729e63f180e6f483953630955ccbc5c676f7de0607f17d5aa029581

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page