Skip to main content

A quicker pickle

Project description

travis pypi

quickle is a fast and small serialization format for a subset of Python types. It’s based off of Pickle, but includes several optimizations and extensions to provide improved performance and security. For supported types, serializing a message with quickle can be ~2-10x faster than using pickle.

Quickle currently supports serializing the following types:

  • None

  • bool

  • int

  • float

  • complex

  • str

  • bytes

  • bytearray

  • tuple

  • list

  • dict

  • set

  • frozenset

  • PickleBuffer

  • quickle.Struct

  • enum.Enum

FAQ

Why not just use pickle?

The builtin pickle module (or other extensions like cloudpickle) can definitely support more types, but come with security issues if you’re unpickling unknown data. From the official docs:

Warning

The pickle module is not secure. Only unpickle data you trust.

It is possible to construct malicious pickle data which will execute arbitrary code during unpickling. Never unpickle data that could have come from an untrusted source, or that could have been tampered with.

The pickle protocol contains instructions for loading and executing arbitrary python code - a maliciously crafted pickle could wipe your machine or steal secrets. quickle does away with those instructions, removing that security issue.

The builtin pickle module also needs to support multiple protocols, and includes some optimizations for writing to/reading from files that result in slowdowns for users wanting fast in-memory performance (as required by networked services). For common payloads quickle can be ~2-10x faster at writing and ~1-3x faster at reading. Here’s a quick non-scientific benchmark (on Python 3.8).

In [1]: import pickle, quickle

In [2]: encoder = quickle.Encoder()

In [3]: data = {"fruit": ["apple", "banana", "cherry", "durian"],
   ...:         "vegetables": ["asparagus", "broccoli", "cabbage"],
   ...:         "numbers": [1, 2, 3, 4, 5]}

In [4]: %timeit pickle.dumps(data)  # pickle
955 ns ± 2.97 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [5]: %timeit encoder.dumps(data) # quickle
481 ns ± 1.76 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [6]: import string, random

In [7]: data = [''.join(random.choices(string.ascii_letters, k=15)) for _ in range(100)]

In [8]: %timeit pickle.dumps(data)  # pickle
5.53 µs ± 35.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [9]: %timeit encoder.dumps(data)  # quickle
1.88 µs ± 5.88 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Why not msgpack, json, etc?

There are optimized versions of msgpack and json for Python that can be great for similar use cases. However, both msgpack and json have simpler object models than Python, which makes it tricky to roundtrip all the rich builtin types Python supports.

  • Both msgpack and json only support a single “array” type, which makes it hard to roundtrip messages where you want to distinguish lists from tuples. Or sets.

  • While msgpack supports both binary and unicode types, json requires all bytes be encoded into something utf8 compatible.

  • Quickle supports “memoization” - if a message contains the same object instance multiple times, it will only be serialized once in the payload. For messages where this may happen, this can result in a significant reduction in payload size. (note that quickle also contains an option to disable memoization if you don’t need it, which can result in further speedups).

  • Quickle also supports recursive and self-referential objects, which will cause recursion errors in other serializers. While uncommon, there are use cases for such data structures, and quickle supports them natively.

  • With the introduction of the Pickle 5 protocol, Pickle (and Quickle) supports sending messages containing large binary payloads in a zero-copy fashion. This is hard (or impossible) to do with either msgpack or json.

quickle is also competitive with common Python msgpack and json implementations. Another non-scientific benchmark:

In [1]: import quickle, orjson, msgpack

In [2]: encoder = quickle.Encoder()

In [3]: packer = msgpack.Packer()

In [4]: data = {"fruit": ["apple", "banana", "cherry", "durian"],
   ...:         "vegetables": ["asparagus", "broccoli", "cabbage"],
   ...:         "numbers": [1, 2, 3, 4, 5]}

In [5]: %timeit encoder.dumps(data)  # quickle
482 ns ± 1.03 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [6]: %timeit packer.pack(data)  # msgpack
852 ns ± 3.22 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [7]: %timeit orjson.dumps(data)  # json
834 ns ± 2.62 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [8]: decoder = quickle.Decoder()

In [9]: quickle_data = encoder.dumps(data)

In [10]: msgpack_data = packer.pack(data)

In [11]: json_data = orjson.dumps(data)

In [12]: %timeit decoder.loads(quickle_data)  # quickle
1.16 µs ± 7.33 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [13]: %timeit msgpack.loads(msgpack_data)  # msgpack
1.07 µs ± 13.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [14]: %timeit orjson.loads(json_data)  # json
1.16 µs ± 3.54 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

That said, if you’re writing a network service that needs to talk to non-python things, json or msgpack will definitely serve you better. Even if you’re writing something only in Python, you might still want to consider using something more standardized like json or msgpack.

When would I use this?

I wanted this for writing RPC-style applications in Python. I was unsatisfied with json or msgpack, since they didn’t support all the rich types I’m used to in Python. And the existing pickle implementation added measurable per-message overhead when writing low-latency applications (not to mention security issues). If you don’t have a similar use case, you may be better served elsewhere.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quickle-0.0.3.tar.gz (39.3 kB view details)

Uploaded Source

Built Distributions

quickle-0.0.3-cp38-cp38-manylinux2010_x86_64.whl (123.0 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

quickle-0.0.3-cp38-cp38-manylinux1_x86_64.whl (123.0 kB view details)

Uploaded CPython 3.8

quickle-0.0.3-cp38-cp38-macosx_10_9_x86_64.whl (38.7 kB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

File details

Details for the file quickle-0.0.3.tar.gz.

File metadata

  • Download URL: quickle-0.0.3.tar.gz
  • Upload date:
  • Size: 39.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/46.2.0.post20200511 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.2

File hashes

Hashes for quickle-0.0.3.tar.gz
Algorithm Hash digest
SHA256 7878e1fe4ac419e6413734308fe21acc30e23696ae932203ec2df3cd1f92333c
MD5 492c0ed7f29559225b54dbbb74c48c14
BLAKE2b-256 13251c57d6d624d835b49978c9087708180f15c93a1f55abacd7efc2f6a80be0

See more details on using hashes here.

Provenance

File details

Details for the file quickle-0.0.3-cp38-cp38-manylinux2010_x86_64.whl.

File metadata

  • Download URL: quickle-0.0.3-cp38-cp38-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 123.0 kB
  • Tags: CPython 3.8, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/46.2.0.post20200511 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.2

File hashes

Hashes for quickle-0.0.3-cp38-cp38-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 2b860522ceb4468e2a4fd85fe1ef86e17d6ca0ce4fc78e795530de4e1fd6e8fc
MD5 025535246b1a84b68a2e52c035591e35
BLAKE2b-256 da7492d52dba48c1b5793152acefb346258f4723fd60aa2a369d625a3759df7f

See more details on using hashes here.

Provenance

File details

Details for the file quickle-0.0.3-cp38-cp38-manylinux1_x86_64.whl.

File metadata

  • Download URL: quickle-0.0.3-cp38-cp38-manylinux1_x86_64.whl
  • Upload date:
  • Size: 123.0 kB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/46.2.0.post20200511 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.2

File hashes

Hashes for quickle-0.0.3-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 d484d92ead4224c2c6c3ffee430764fa695e9e26f08e0cbff5ce7c59a55a50b1
MD5 604bf7cc189f73d7bd8a4eff90b55d63
BLAKE2b-256 b56b99edc979c07a49bc0dd44c009d5710715b0960396703b7d88882f63ddfcb

See more details on using hashes here.

Provenance

File details

Details for the file quickle-0.0.3-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: quickle-0.0.3-cp38-cp38-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 38.7 kB
  • Tags: CPython 3.8, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/46.2.0.post20200511 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.2

File hashes

Hashes for quickle-0.0.3-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 419fce27207cfb059cbfee6f65be87177eda929d4949933339339cf977b43eb1
MD5 17c15b3af7cb8df5b22a3717ab09a370
BLAKE2b-256 01efbc86540a44992503b513f75e0c41d88c283da3589a84f6cb5648f59f759d

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page