Skip to main content

BK-tree data structure to allow fast querying of "close" matches

Project description

pybktree is a generic, pure Python implementation of a BK-tree data structure, which allows fast querying of “close” matches (for example, matches with small hamming distance or Levenshtein distance). This module is based on the algorithm by Nick Johnson in his blog article on BK-trees.

The library is on the Python Package Index (PyPI) and works on both Python 3 and Python 2.7. To install it, fire up a command prompt, activate your virtual environment if you’re using one, and type:

pip install pybktree

Example usage:

>>> tree = pybktree.BKTree(pybktree.hamming_distance, [0, 4, 5, 14])
>>> tree.add(15)              # add element 15
>>> sorted(tree)              # BKTree instances are iterable
[0, 4, 5, 14, 15]
>>> sorted(tree.find(13, 1))  # find elements at most 1 bit away from element 13
[(1, 5), (1, 15)]

For large trees and fairly small N when calling find(), using a BKTree is much faster than doing a linear search. This is especially good when you’re de-duping a few hundred thousand photos – with a linear search that would become a very slow, O(N²) operation. With a BKTree, it’s more like O(N log N).

Read the code in pybktree.py for more details – it’s pretty small!

Other BK-tree modules I found on GitHub while writing this one:

  • ahupp/bktree: this one is pretty good, but it’s not on PyPI, and it’s recursive

  • ryanfox/bktree: this one is hard to customize, search() doesn’t return distances, it’s slower, and was buggy (though I think he fixed it recently)

pybktree was written by Ben Hoyt for Jetsetter and is licensed with a permissive MIT license (see LICENSE.txt).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pybktree-1.1.tar.gz (4.5 kB view details)

Uploaded Source

File details

Details for the file pybktree-1.1.tar.gz.

File metadata

  • Download URL: pybktree-1.1.tar.gz
  • Upload date:
  • Size: 4.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for pybktree-1.1.tar.gz
Algorithm Hash digest
SHA256 eec0037cdd3d7553e6d72435a4379bede64be17c6712f149e485169638154d2b
MD5 9c809214fc979391fdd75dae1942cfb2
BLAKE2b-256 6b38a5fba29cf727ca8249d3840016e1f7919896111aa28d2cb96f013eaa480d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page