Skip to main content

A number array to/from bytes high performance encoder/decoder.

Project description

What is tencdec

A number array to/from bytes high performance encoder/decoder.

It gets a list of monotonic increasing integers and can encode it to a byte object very fast in a compressed form using deltas.

Then you may store that byte object in a DB or whatever, and when you need the list of integers back, you just decode it.

Example:

>>> numbers = [0, 1, 2, 3, 4, 28, 87, 87, 500, 501, 507, 2313]
>>> enc = tencdec.encode(numbers)
>>> enc
b'\x00\x01\x01\x01\x01\x18;\x00\x9d\x03\x01\x06\x8e\x0e'
>>> dec = tencdec.decode(enc)
>>> numbers == dec
True

And it's very fast!

Using the numbers from the example above, timeit shows around 2 microseconds to encode or decode (in a AMD Ryzen 7 PRO 4750U CPU):

$ python3 -m timeit -s "import tencdec; numbers = [0, 1, 2, 3, 4, 28, 87, 87, 500, 501, 507, 2313]" "tencdec.encode(numbers)"
100000 loops, best of 5: 2.28 usec per loop
$ python3 -m timeit -s "import tencdec; e = tencdec.encode([0, 1, 2, 3, 4, 28, 87, 87, 500, 501, 507, 2313])" "tencdec.decode(e)"
100000 loops, best of 5: 2.42 usec per loop

The restriction are that numbers need to be integers (else encoding will crash with TypeError) and monotonic increasing positive (this is verified, otherwise it gets into an infinite loop, but with an assert so you may disable the verification running Python with -O if you are already sure that list of numbers is ok).

Note that there are no external dependencies for this. It's just Python 3 and its standard library.

How it works

It encodes a delta of the numbers. Deltas must always positive (that's why source numbers must be monotonic increasing).

If the delta is less than or equal to 127, it's stored directly, otherwise it's stored in multiple bytes, using seven bits on each byte, with the most significant one in 1 if more bytes to process.

E.g. for a simple case:

0000 0100 -> 4 (in decimal)

Multiple bytes:

    1111 0100
    0000 0011
  • first byte indicates that it goes on, second byte indicates that ends there

  • bits are collected without using the most significant one, in reverse order:

    000 0011 111 0100 -> 0000 0001 1111 0100 -> 500 (in decimal)
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tencdec-0.0.1.tar.gz (12.1 kB view details)

Uploaded Source

Built Distribution

tencdec-0.0.1-py3-none-any.whl (12.8 kB view details)

Uploaded Python 3

File details

Details for the file tencdec-0.0.1.tar.gz.

File metadata

  • Download URL: tencdec-0.0.1.tar.gz
  • Upload date:
  • Size: 12.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for tencdec-0.0.1.tar.gz
Algorithm Hash digest
SHA256 2799016ebd1fe033c5fe93947f5fed5e5fed8adcb4bf60c869443834b80e2354
MD5 f4d04379d645f5026737cd35a8bcd284
BLAKE2b-256 f2214fc7831fc86b8344bab9ff16cd357953599d7a2ab1614f4c5cfc7ac59436

See more details on using hashes here.

File details

Details for the file tencdec-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: tencdec-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 12.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for tencdec-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 81a4da617baeaa0719ac71dbc0a8bc0b4d27a1e03c709eefc027d527ebd672d0
MD5 bcdfed6c955f451049db899bd9ab1951
BLAKE2b-256 fd4d038e0c3a0a322ffa4736c59d99187eb4b2eac80dca0a6e2080913f00bdf2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page