HAT-Trie for Python
Project description
hat-trie
HAT-Trie structure for Python (2.x and 3.x).
This package is a Python wrapper for hat-trie C library.
Installation
pip install hat-trie
Usage
Create a new trie:
>>> from hat_trie import Trie >>> trie = Trie()
trie variable is a dict-like object that support unicode keys and can have any Python object as a value. For keys that share prefixes it usually uses less memory than Python dict.
There is also hat_trie.IntTrie which only supports positive integers as values. It can be more efficient when you don’t need arbitrary objects as values. For example, if you need to store float values then storing them in an array (either numpy or stdlib’s array.array) and using IntTrie values as indices could be more memory efficient than storing Python float objects directly in hat_trie.Trie.
Currently implemented methods are:
__getitem__()
__setitem__()
__contains__()
__len__()
get()
setdefault()
keys()
iterkeys()
Other methods are not implemented - contributions are welcome!
Performance
Performance is measured for hat_trie.Trie against Python’s dict with 100k unique unicode words (English and Russian) as keys and ‘1’ numbers as values.
Benchmark results for Python 3.3 (intel i5 1.8GHz, “1.000M ops/sec” == “1 000 000 operations per second”):
dict __getitem__ (hits) 6.874M ops/sec trie __getitem__ (hits) 3.754M ops/sec dict __contains__ (hits) 7.035M ops/sec trie __contains__ (hits) 3.772M ops/sec dict __contains__ (misses) 5.356M ops/sec trie __contains__ (misses) 3.364M ops/sec dict __len__ 785958.286 ops/sec trie __len__ 574164.704 ops/sec dict __setitem__ (updates) 6.830M ops/sec trie __setitem__ (updates) 3.472M ops/sec dict __setitem__ (inserts) 6.774M ops/sec trie __setitem__ (inserts) 2.460M ops/sec dict setdefault (updates) 3.522M ops/sec trie setdefault (updates) 2.680M ops/sec dict setdefault (inserts) 4.062M ops/sec trie setdefault (inserts) 1.866M ops/sec dict keys() 189.564 ops/sec trie keys() 16.067 ops/sec
HAT-Trie is about 1.5x faster that datrie on all supported operations; it also supports fast inserts unlike datrie. On the other hand, datrie has more features (e.g. better iteration support and richer API); datrie is also more memory efficient.
If you need a memory efficient data structure and don’t need inserts then marisa-trie or DAWG should work better.
Contributing
Development happens at github:
Feel free to submit ideas, bugs, pull requests or regular patches.
Please don’t commit changes to generated C files; I will rebuild them myself.
Running tests and benchmarks
Make sure tox is installed and run
$ ./update_c.sh $ tox
from the source checkout. You will need Cython to do that.
Tests should pass under python 2.6, 2.7, 3.2, 3.3 and 3.4.
$ tox -c bench.ini
runs benchmarks.
License
Licensed under MIT License.
0.2 (2014-08-22)
Installation is simplified: Cython is no longer required;
get method for tries (thanks Brandon Forehand);
iterkeys method is fixed (thanks Brandon Forehand);
hat_trie.Trie can store any Python object as a value (thanks Brandon Forehand);
segfault is fixed for large int values (thanks Brandon Forehand);
hat-trie C library is updated to the latest version to fix some issues with 64bit builds and RHEL (thanks Brandon Forehand and Michael Heilman);
0.1 (2014-03-27)
Initial release.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file hat-trie-0.2.tar.gz
.
File metadata
- Download URL: hat-trie-0.2.tar.gz
- Upload date:
- Size: 70.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 320f40f957273c5550725a9be09791b7cdb82d7ea151d68bf59237553c588f42 |
|
MD5 | 54bae23b897e95da01034f9ef83fda98 |
|
BLAKE2b-256 | 742e32de4a47d3ee71854ee3915aa60564b67ca32ef9b6b1a3c0cb21bf0bac8b |