simdjson bindings for python
Project description
pysimdjson
Quick-n'dirty Python bindings for simdjson just to see if going down this path might yield some parse time improvements in real-world applications.
These bindings are currently only tested on OS X, but should work everywhere simdjson does although you'll probably have to tweak your build flags.
Installation
There are binary wheels available for OS X 10.12. On other platforms you'll need a C++11-capable compiler.
pip install pysimdjson
or from source:
git clone https://github.com/TkTech/pysimdjson.git
cd pysimdjson
python setup.py install
Example
import pysimdjson
with open('sample.json', 'rb') as fin:
doc = pysimdjson.loads(fin.read())
However, this doesn't really gain you that much over, say, ujson. You're still
loading the entire document and converting the entire thing into a series of
Python objects which is very expensive. You can instead use items()
to pull
only part of a document into Python.
Example document:
{
"type": "search_results",
"count": 2,
"results": [
{"username": "bob"},
{"username": "tod"}
],
"error": {
"message": "All good captain"
}
}
And now lets try some queries...
import pysimdjson
with open('sample.json', 'rb') as fin:
# Calling ParsedJson with a document is a shortcut for
# calling pj.allocate_capacity(<size>) and pj.parse(<doc>). If you're
# parsing many JSON documents of similar sizes, you can allocate
# a large buffer just once and keep re-using it instead.
pj = pysimdjson.ParsedJson(fin.read())
pj.items('.type') #> "search_results"
pj.items('.count') #> 2
pj.items('.results[].username) #> ["bob", "tod"]
pj.items('.error.message') #> "All good captain"
AVX2
simdjson requires AVX2 support to function. Check to see if your OS/processor supports it:
- OS X:
sysctl -a | grep machdep.cpu.leaf7_features
- Linux:
grep avx2 /proc/cpuinfo
Early Benchmark
Comparing the built-in json module loads
on py3.7 to pysimdjson loads
.
File | json time |
pysimdjson time |
---|---|---|
jsonexamples/apache_builds.json |
0.09916733999999999 | 0.074089268 |
jsonexamples/canada.json |
5.305393378 | 1.6547515810000002 |
jsonexamples/citm_catalog.json |
1.3718639709999998 | 1.0438697340000003 |
jsonexamples/github_events.json |
0.04840242700000097 | 0.034239397999998644 |
jsonexamples/gsoc-2018.json |
1.5382746889999996 | 0.9597240750000005 |
jsonexamples/instruments.json |
0.24350973299999978 | 0.13639699600000021 |
jsonexamples/marine_ik.json |
4.505123285000002 | 2.8965093270000004 |
jsonexamples/mesh.json |
1.0325923849999974 | 0.38916503499999777 |
jsonexamples/mesh.pretty.json |
1.7129034710000006 | 0.46509220500000126 |
jsonexamples/numbers.json |
0.16577519699999854 | 0.04843887400000213 |
jsonexamples/random.json |
0.6930746310000018 | 0.6175370539999996 |
jsonexamples/twitter.json |
0.6069602610000011 | 0.41049074900000093 |
jsonexamples/twitterescaped.json |
0.7587005720000022 | 0.41576198399999953 |
jsonexamples/update-center.json |
0.5577604210000011 | 0.4961777420000004 |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pysimdjson-1.1.0.tar.gz
.
File metadata
- Download URL: pysimdjson-1.1.0.tar.gz
- Upload date:
- Size: 56.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5aa0f3f9fec236f5cae78b0bd23eb0542c5d69f6e30a1f0b8267acabe47fde7c |
|
MD5 | 2604e502562c5cf85eaf6cdc23b62adf |
|
BLAKE2b-256 | 325c4af2f45c12686125f022bb670581de870198a85beb8aae19778341249bf7 |
File details
Details for the file pysimdjson-1.1.0-cp37-cp37m-macosx_10_12_x86_64.whl
.
File metadata
- Download URL: pysimdjson-1.1.0-cp37-cp37m-macosx_10_12_x86_64.whl
- Upload date:
- Size: 123.6 kB
- Tags: CPython 3.7m, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f95446b813b2c24eaf1b3d3d9c77dadff6dd0049a47605ab39dc667486d2c52e |
|
MD5 | 083e063d2422b9dfde6b7abc2478e418 |
|
BLAKE2b-256 | e93a9ce8bc5447ffcb0ab99bbd1f1d34fc31b08478c43bbc6111869047b94275 |