A quicker pickle
Project description
quickle is a fast and small serialization format for a subset of Python types. It’s based off of Pickle, but includes several optimizations and extensions to provide improved performance and security. For supported types, serializing a message with quickle can be ~2-10x faster than using pickle.
Quickle currently supports serializing the following types:
None
bool
int
float
complex
str
bytes
bytearray
tuple
list
dict
set
frozenset
PickleBuffer
quickle.Struct
enum.Enum
FAQ
Why not just use pickle?
The builtin pickle module (or other extensions like cloudpickle) can definitely support more types, but come with security issues if you’re unpickling unknown data. From the official docs:
Warning
The pickle module is not secure. Only unpickle data you trust.
It is possible to construct malicious pickle data which will execute arbitrary code during unpickling. Never unpickle data that could have come from an untrusted source, or that could have been tampered with.
The pickle protocol contains instructions for loading and executing arbitrary python code - a maliciously crafted pickle could wipe your machine or steal secrets. quickle does away with those instructions, removing that security issue.
The builtin pickle module also needs to support multiple protocols, and includes some optimizations for writing to/reading from files that result in slowdowns for users wanting fast in-memory performance (as required by networked services). For common payloads quickle can be ~2-10x faster at writing and ~1-3x faster at reading. Here’s a quick non-scientific benchmark (on Python 3.8).
In [1]: import pickle, quickle
In [2]: encoder = quickle.Encoder()
In [3]: data = {"fruit": ["apple", "banana", "cherry", "durian"],
...: "vegetables": ["asparagus", "broccoli", "cabbage"],
...: "numbers": [1, 2, 3, 4, 5]}
In [4]: %timeit pickle.dumps(data) # pickle
955 ns ± 2.97 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [5]: %timeit encoder.dumps(data) # quickle
481 ns ± 1.76 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [6]: import string, random
In [7]: data = [''.join(random.choices(string.ascii_letters, k=15)) for _ in range(100)]
In [8]: %timeit pickle.dumps(data) # pickle
5.53 µs ± 35.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [9]: %timeit encoder.dumps(data) # quickle
1.88 µs ± 5.88 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Why not msgpack, json, etc?
There are optimized versions of msgpack and json for Python that can be great for similar use cases. However, both msgpack and json have simpler object models than Python, which makes it tricky to roundtrip all the rich builtin types Python supports.
Both msgpack and json only support a single “array” type, which makes it hard to roundtrip messages where you want to distinguish lists from tuples. Or sets.
While msgpack supports both binary and unicode types, json requires all bytes be encoded into something utf8 compatible.
Quickle supports “memoization” - if a message contains the same object instance multiple times, it will only be serialized once in the payload. For messages where this may happen, this can result in a significant reduction in payload size. (note that quickle also contains an option to disable memoization if you don’t need it, which can result in further speedups).
Quickle also supports recursive and self-referential objects, which will cause recursion errors in other serializers. While uncommon, there are use cases for such data structures, and quickle supports them natively.
With the introduction of the Pickle 5 protocol, Pickle (and Quickle) supports sending messages containing large binary payloads in a zero-copy fashion. This is hard (or impossible) to do with either msgpack or json.
quickle is also competitive with common Python msgpack and json implementations. Another non-scientific benchmark:
In [1]: import quickle, orjson, msgpack
In [2]: encoder = quickle.Encoder()
In [3]: packer = msgpack.Packer()
In [4]: data = {"fruit": ["apple", "banana", "cherry", "durian"],
...: "vegetables": ["asparagus", "broccoli", "cabbage"],
...: "numbers": [1, 2, 3, 4, 5]}
In [5]: %timeit encoder.dumps(data) # quickle
482 ns ± 1.03 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [6]: %timeit packer.pack(data) # msgpack
852 ns ± 3.22 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [7]: %timeit orjson.dumps(data) # json
834 ns ± 2.62 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [8]: decoder = quickle.Decoder()
In [9]: quickle_data = encoder.dumps(data)
In [10]: msgpack_data = packer.pack(data)
In [11]: json_data = orjson.dumps(data)
In [12]: %timeit decoder.loads(quickle_data) # quickle
1.16 µs ± 7.33 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [13]: %timeit msgpack.loads(msgpack_data) # msgpack
1.07 µs ± 13.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [14]: %timeit orjson.loads(json_data) # json
1.16 µs ± 3.54 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
That said, if you’re writing a network service that needs to talk to non-python things, json or msgpack will definitely serve you better. Even if you’re writing something only in Python, you might still want to consider using something more standardized like json or msgpack.
When would I use this?
I wanted this for writing RPC-style applications in Python. I was unsatisfied with json or msgpack, since they didn’t support all the rich types I’m used to in Python. And the existing pickle implementation added measurable per-message overhead when writing low-latency applications (not to mention security issues). If you don’t have a similar use case, you may be better served elsewhere.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file quickle-0.0.3.tar.gz
.
File metadata
- Download URL: quickle-0.0.3.tar.gz
- Upload date:
- Size: 39.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/46.2.0.post20200511 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7878e1fe4ac419e6413734308fe21acc30e23696ae932203ec2df3cd1f92333c |
|
MD5 | 492c0ed7f29559225b54dbbb74c48c14 |
|
BLAKE2b-256 | 13251c57d6d624d835b49978c9087708180f15c93a1f55abacd7efc2f6a80be0 |
Provenance
File details
Details for the file quickle-0.0.3-cp38-cp38-manylinux2010_x86_64.whl
.
File metadata
- Download URL: quickle-0.0.3-cp38-cp38-manylinux2010_x86_64.whl
- Upload date:
- Size: 123.0 kB
- Tags: CPython 3.8, manylinux: glibc 2.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/46.2.0.post20200511 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2b860522ceb4468e2a4fd85fe1ef86e17d6ca0ce4fc78e795530de4e1fd6e8fc |
|
MD5 | 025535246b1a84b68a2e52c035591e35 |
|
BLAKE2b-256 | da7492d52dba48c1b5793152acefb346258f4723fd60aa2a369d625a3759df7f |
Provenance
File details
Details for the file quickle-0.0.3-cp38-cp38-manylinux1_x86_64.whl
.
File metadata
- Download URL: quickle-0.0.3-cp38-cp38-manylinux1_x86_64.whl
- Upload date:
- Size: 123.0 kB
- Tags: CPython 3.8
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/46.2.0.post20200511 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d484d92ead4224c2c6c3ffee430764fa695e9e26f08e0cbff5ce7c59a55a50b1 |
|
MD5 | 604bf7cc189f73d7bd8a4eff90b55d63 |
|
BLAKE2b-256 | b56b99edc979c07a49bc0dd44c009d5710715b0960396703b7d88882f63ddfcb |
Provenance
File details
Details for the file quickle-0.0.3-cp38-cp38-macosx_10_9_x86_64.whl
.
File metadata
- Download URL: quickle-0.0.3-cp38-cp38-macosx_10_9_x86_64.whl
- Upload date:
- Size: 38.7 kB
- Tags: CPython 3.8, macOS 10.9+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/46.2.0.post20200511 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 419fce27207cfb059cbfee6f65be87177eda929d4949933339339cf977b43eb1 |
|
MD5 | 17c15b3af7cb8df5b22a3717ab09a370 |
|
BLAKE2b-256 | 01efbc86540a44992503b513f75e0c41d88c283da3589a84f6cb5648f59f759d |