Skip to main content

HAT-Trie for Python

Project description

hat-trie

HAT-Trie structure for Python (2.x and 3.x).

This package is a Python wrapper for hat-trie C library.

https://travis-ci.org/kmike/hat-trie.svg?branch=master

Installation

pip install hat-trie

Usage

Create a new trie:

>>> from hat_trie import Trie
>>> trie = Trie()

trie variable is a dict-like object that support unicode keys and can have any Python object as a value. For keys that share prefixes it usually uses less memory than Python dict.

There is also hat_trie.IntTrie which only supports positive integers as values. It can be more efficient when you don’t need arbitrary objects as values. For example, if you need to store float values then storing them in an array (either numpy or stdlib’s array.array) and using IntTrie values as indices could be more memory efficient than storing Python float objects directly in hat_trie.Trie.

Another way to store float values is to use hat_trie.FloatTrie(). In this case precision is limited to float32.

Currently implemented methods are:

  • __getitem__()

  • __setitem__()

  • __contains__()

  • __len__()

  • get()

  • setdefault()

  • keys()

  • iterkeys()

Other methods are not implemented - contributions are welcome!

Performance

Performance is measured for hat_trie.Trie against Python’s dict with 100k unique unicode words (English and Russian) as keys and ‘1’ numbers as values.

Benchmark results for Python 3.3 (intel i5 1.8GHz, “1.000M ops/sec” == “1 000 000 operations per second”):

dict __getitem__ (hits)      6.874M ops/sec
trie __getitem__ (hits)      3.754M ops/sec
dict __contains__ (hits)     7.035M ops/sec
trie __contains__ (hits)     3.772M ops/sec
dict __contains__ (misses)   5.356M ops/sec
trie __contains__ (misses)   3.364M ops/sec
dict __len__                 785958.286 ops/sec
trie __len__                 574164.704 ops/sec
dict __setitem__ (updates)   6.830M ops/sec
trie __setitem__ (updates)   3.472M ops/sec
dict __setitem__ (inserts)   6.774M ops/sec
trie __setitem__ (inserts)   2.460M ops/sec
dict setdefault (updates)    3.522M ops/sec
trie setdefault (updates)    2.680M ops/sec
dict setdefault (inserts)    4.062M ops/sec
trie setdefault (inserts)    1.866M ops/sec
dict keys()                  189.564 ops/sec
trie keys()                  16.067 ops/sec

HAT-Trie is about 1.5x faster that datrie on all supported operations; it also supports fast inserts unlike datrie. On the other hand, datrie has more features (e.g. better iteration support and richer API); datrie is also more memory efficient.

If you need a memory efficient data structure and don’t need inserts then marisa-trie or DAWG should work better.

Contributing

Development happens at github:

Feel free to submit ideas, bugs, pull requests or regular patches.

Please don’t commit changes to generated C files; I will rebuild them myself.

Running tests and benchmarks

Make sure tox is installed and run

$ ./update_c.sh
$ tox

from the source checkout. You will need Cython to do that.

Tests should pass under python 2.7 and 3.3+.

$ tox -c bench.ini

runs benchmarks.

Authors & Contributors

This module wraps hat-trie C library by Daniel Jones & contributors.

License

Licensed under MIT License.

0.3 (2016-02-08)

  • hat-trie C library is updated to the latest version (thanks Michael Phan-Ba);

  • FloatTrie (thanks Michael Phan-Ba);

  • Python 2.6 and Python 3.2 support is dropped. hat-trie 0.3 likely still works in 2.6 and 3.2, but this is no longer checked by unit tests, and future compatibility is not guaranteed;

  • setup.py is switched to setuptools.

0.2 (2014-08-22)

  • Installation is simplified: Cython is no longer required;

  • get method for tries (thanks Brandon Forehand);

  • iterkeys method is fixed (thanks Brandon Forehand);

  • hat_trie.Trie can store any Python object as a value (thanks Brandon Forehand);

  • segfault is fixed for large int values (thanks Brandon Forehand);

  • hat-trie C library is updated to the latest version to fix some issues with 64bit builds and RHEL (thanks Brandon Forehand and Michael Heilman);

0.1 (2014-03-27)

Initial release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hat-trie-0.3.tar.gz (79.9 kB view details)

Uploaded Source

Built Distributions

hat_trie-0.3-cp35-cp35m-macosx_10_11_x86_64.whl (48.6 kB view details)

Uploaded CPython 3.5m macOS 10.11+ x86-64

hat_trie-0.3-cp27-none-macosx_10_10_x86_64.whl (48.6 kB view details)

Uploaded CPython 2.7 macOS 10.10+ x86-64

File details

Details for the file hat-trie-0.3.tar.gz.

File metadata

  • Download URL: hat-trie-0.3.tar.gz
  • Upload date:
  • Size: 79.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for hat-trie-0.3.tar.gz
Algorithm Hash digest
SHA256 403643764538a3692de2664894d9a0567fe6449465d0a623aed514593e74a1f6
MD5 6c716db2f62d5a07ac55300b19c9a181
BLAKE2b-256 fb69846a812b7cad00ced7e8ed5c10ffd1eb6b210341d4ee65abbb31d2a0bc17

See more details on using hashes here.

File details

Details for the file hat_trie-0.3-cp35-cp35m-macosx_10_11_x86_64.whl.

File metadata

File hashes

Hashes for hat_trie-0.3-cp35-cp35m-macosx_10_11_x86_64.whl
Algorithm Hash digest
SHA256 b8103cb0c814e5fbb4dfa3704daf4c06299abf9e19edec0f1a4e4e0c1fe7cdd4
MD5 a358d0e5d25032e15193ae1b3a1e9f43
BLAKE2b-256 265f488ea9e2cfdfa84530b6eb04179148c425a297c7f9d94e17d9fbf2d3a63a

See more details on using hashes here.

File details

Details for the file hat_trie-0.3-cp27-none-macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for hat_trie-0.3-cp27-none-macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 fa9b0ba04a76c9d30c9a057405ae2475acdd8e30dcc1debee38d1342606afad2
MD5 136f1ef8e1f36693e38e8b3635001a86
BLAKE2b-256 ba59dbd9b096cd80468b500137d3c56c6840ca9fd29fd163c0ad294034255396

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page