A number array to/from bytes high performance encoder/decoder.
Project description
What is tencdec
A number array to/from bytes high performance encoder/decoder.
It gets a list of monotonic increasing integers and can encode it to a byte object very fast in a compressed form using deltas.
Then you may store that byte object in a DB or whatever, and when you need the list of integers back, you just decode it.
Example:
>>> numbers = [0, 1, 2, 3, 4, 28, 87, 87, 500, 501, 507, 2313]
>>> enc = tencdec.encode(numbers)
>>> enc
b'\x00\x01\x01\x01\x01\x18;\x00\x9d\x03\x01\x06\x8e\x0e'
>>> dec = tencdec.decode(enc)
>>> numbers == dec
True
And it's very fast!
Using the numbers from the example above, timeit
shows around 2 microseconds to encode or decode (in a AMD Ryzen 7 PRO 4750U CPU):
$ python3 -m timeit -s "import tencdec; numbers = [0, 1, 2, 3, 4, 28, 87, 87, 500, 501, 507, 2313]" "tencdec.encode(numbers)"
100000 loops, best of 5: 2.28 usec per loop
$ python3 -m timeit -s "import tencdec; e = tencdec.encode([0, 1, 2, 3, 4, 28, 87, 87, 500, 501, 507, 2313])" "tencdec.decode(e)"
100000 loops, best of 5: 2.42 usec per loop
The restriction are that numbers need to be integers (else encoding will crash with TypeError
) and monotonic increasing positive (this is verified, otherwise it gets into an infinite loop, but with an assert
so you may disable the verification running Python with -O
if you are already sure that list of numbers is ok).
Note that there are no external dependencies for this. It's just Python 3 and its standard library.
How it works
It encodes a delta of the numbers. Deltas must always positive (that's why source numbers must be monotonic increasing).
If the delta is less than or equal to 127, it's stored directly, otherwise it's stored in multiple bytes, using seven bits on each byte, with the most significant one in 1 if more bytes to process.
E.g. for a simple case:
0000 0100 -> 4 (in decimal)
Multiple bytes:
1111 0100
0000 0011
-
first byte indicates that it goes on, second byte indicates that ends there
-
bits are collected without using the most significant one, in reverse order:
000 0011 111 0100 -> 0000 0001 1111 0100 -> 500 (in decimal)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file tencdec-0.0.1.tar.gz
.
File metadata
- Download URL: tencdec-0.0.1.tar.gz
- Upload date:
- Size: 12.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2799016ebd1fe033c5fe93947f5fed5e5fed8adcb4bf60c869443834b80e2354 |
|
MD5 | f4d04379d645f5026737cd35a8bcd284 |
|
BLAKE2b-256 | f2214fc7831fc86b8344bab9ff16cd357953599d7a2ab1614f4c5cfc7ac59436 |
File details
Details for the file tencdec-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: tencdec-0.0.1-py3-none-any.whl
- Upload date:
- Size: 12.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 81a4da617baeaa0719ac71dbc0a8bc0b4d27a1e03c709eefc027d527ebd672d0 |
|
MD5 | bcdfed6c955f451049db899bd9ab1951 |
|
BLAKE2b-256 | fd4d038e0c3a0a322ffa4736c59d99187eb4b2eac80dca0a6e2080913f00bdf2 |