Skip to main content

simdjson bindings for python

Project description

PyPI - License CircleCI branch AppVeyor branch

pysimdjson

Quick-n'dirty Python bindings for simdjson just to see if going down this path might yield some parse time improvements in real-world applications. So far, the results are promising, especially when only part of a document is of interest.

These bindings are currently only tested on OS X & Windows, but should work everywhere simdjson does although you'll probably have to tweak your build flags.

See the latest documentation at http://pysimdjson.tkte.ch.

Installation

There are binary wheels available for py3.6/py3.7 on OS X 10.12 & Windows. On other platforms you'll need a C++17-capable compiler.

pip install pysimdjson

or from source:

git clone https://github.com/TkTech/pysimdjson.git
cd pysimdjson
python setup.py install

Example

import simdjson

with open('sample.json', 'rb') as fin:
    doc = simdjson.loads(fin.read())

However, this doesn't really gain you that much over, say, ujson. You're still loading the entire document and converting the entire thing into a series of Python objects which is very expensive. You can instead use items() to pull only part of a document into Python.

Example document:

{
    "type": "search_results",
    "count": 2,
    "results": [
        {"username": "bob"},
        {"username": "tod"}
    ],
    "error": {
        "message": "All good captain"
    }
}

And now lets try some queries...

import simdjson

with open('sample.json', 'rb') as fin:
    # Calling ParsedJson with a document is a shortcut for
    # calling pj.allocate_capacity(<size>) and pj.parse(<doc>). If you're
    # parsing many JSON documents of similar sizes, you can allocate
    # a large buffer just once and keep re-using it instead.
    pj = simdjson.ParsedJson(fin.read())

    pj.items('.type') #> "search_results"
    pj.items('.count') #> 2
    pj.items('.results[].username') #> ["bob", "tod"]
    pj.items('.error.message') #> "All good captain"

AVX2

simdjson requires AVX2 support to function. Check to see if your OS/processor supports it:

  • OS X: sysctl -a | grep machdep.cpu.leaf7_features
  • Linux: grep avx2 /proc/cpuinfo

Low-level interface

You can use the low-level simdjson Iterator interface directly, just be aware that this interface can change any time. If you depend on it you should pin to a specific version of simdjson. You may need to use this interface if you're dealing with odd JSON, such as a document with repeated non-unique keys.

with open('sample.json', 'rb') as fin:
    pj = simdjson.ParsedJson(fin.read())
    iter = simdjson.Iterator(pj)
    if iter.is_object():
        if iter.down():
            print(iter.get_string())

Early Benchmark

Comparing the built-in json module loads on py3.7 to simdjson loads.

File json time pysimdjson time
jsonexamples/apache_builds.json 0.09916733999999999 0.074089268
jsonexamples/canada.json 5.305393378 1.6547515810000002
jsonexamples/citm_catalog.json 1.3718639709999998 1.0438697340000003
jsonexamples/github_events.json 0.04840242700000097 0.034239397999998644
jsonexamples/gsoc-2018.json 1.5382746889999996 0.9597240750000005
jsonexamples/instruments.json 0.24350973299999978 0.13639699600000021
jsonexamples/marine_ik.json 4.505123285000002 2.8965093270000004
jsonexamples/mesh.json 1.0325923849999974 0.38916503499999777
jsonexamples/mesh.pretty.json 1.7129034710000006 0.46509220500000126
jsonexamples/numbers.json 0.16577519699999854 0.04843887400000213
jsonexamples/random.json 0.6930746310000018 0.6175370539999996
jsonexamples/twitter.json 0.6069602610000011 0.41049074900000093
jsonexamples/twitterescaped.json 0.7587005720000022 0.41576198399999953
jsonexamples/update-center.json 0.5577604210000011 0.4961777420000004

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysimdjson-1.2.1.tar.gz (57.5 kB view details)

Uploaded Source

Built Distributions

pysimdjson-1.2.1-py3.7-win-amd64.egg (118.3 kB view details)

Uploaded Source

pysimdjson-1.2.1-py3.6-win-amd64.egg (118.5 kB view details)

Uploaded Source

pysimdjson-1.2.1-cp37-cp37m-win_amd64.whl (122.2 kB view details)

Uploaded CPython 3.7m Windows x86-64

pysimdjson-1.2.1-cp37-cp37m-macosx_10_12_x86_64.whl (127.0 kB view details)

Uploaded CPython 3.7m macOS 10.12+ x86-64

pysimdjson-1.2.1-cp36-cp36m-win_amd64.whl (122.4 kB view details)

Uploaded CPython 3.6m Windows x86-64

File details

Details for the file pysimdjson-1.2.1.tar.gz.

File metadata

  • Download URL: pysimdjson-1.2.1.tar.gz
  • Upload date:
  • Size: 57.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.0

File hashes

Hashes for pysimdjson-1.2.1.tar.gz
Algorithm Hash digest
SHA256 25b9bb9195bbf7ec39c5e5f3a47c4a0a53587cfb8028a72f7635f8565b678026
MD5 0c61bdcae7e61769c8dd32aa1c6025d3
BLAKE2b-256 cdaaa74b72a9e54c286aa5b5030596455be0f117e7b496dd651d126e23e870d8

See more details on using hashes here.

File details

Details for the file pysimdjson-1.2.1-py3.7-win-amd64.egg.

File metadata

  • Download URL: pysimdjson-1.2.1-py3.7-win-amd64.egg
  • Upload date:
  • Size: 118.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2

File hashes

Hashes for pysimdjson-1.2.1-py3.7-win-amd64.egg
Algorithm Hash digest
SHA256 21c6d5eec3a3675420e754a5236ac8d969e6b698340e84de2751d429b3097a7c
MD5 cb299abaf57915d1ccc3f71cf4bc72ba
BLAKE2b-256 d566231305802dc0e49ae72b56fe2e33a75e9289cfeca3f59becb01acf550dd7

See more details on using hashes here.

File details

Details for the file pysimdjson-1.2.1-py3.6-win-amd64.egg.

File metadata

  • Download URL: pysimdjson-1.2.1-py3.6-win-amd64.egg
  • Upload date:
  • Size: 118.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8

File hashes

Hashes for pysimdjson-1.2.1-py3.6-win-amd64.egg
Algorithm Hash digest
SHA256 cbf4dd51dbc2be5e6de23b69af0ec44a7121c841315840c096d06fc45ef38f5f
MD5 73d1ae962251057f8c46b1371b7ae14f
BLAKE2b-256 3277ec9ace7041e51c612a2fe77c7b605dd11260e2d160934f13203f76470e51

See more details on using hashes here.

File details

Details for the file pysimdjson-1.2.1-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: pysimdjson-1.2.1-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 122.2 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2

File hashes

Hashes for pysimdjson-1.2.1-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 f66a4919a1a05a0aa3c6a058ecc3b01109233b0d9d9014b5cd4c3d58d12cbbda
MD5 2b9f78d0dddceeaea9a67b4e0d48c0cc
BLAKE2b-256 d66bd7f97c53c2a4cc87b40888654e5d3fb9c96cff8ce6f3e3d3e1299b451368

See more details on using hashes here.

File details

Details for the file pysimdjson-1.2.1-cp37-cp37m-macosx_10_12_x86_64.whl.

File metadata

  • Download URL: pysimdjson-1.2.1-cp37-cp37m-macosx_10_12_x86_64.whl
  • Upload date:
  • Size: 127.0 kB
  • Tags: CPython 3.7m, macOS 10.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.0

File hashes

Hashes for pysimdjson-1.2.1-cp37-cp37m-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 47eec5e469bbcedde4fbc49821d11595ae305820b1ffbee751cbd0aa98985adf
MD5 e6460b97c1cbdad8b5ab3a1ed28c9a40
BLAKE2b-256 ebf6f4b89358079c903a880298039cc50249b57ad8c7ff12e7e7d83c84e645b5

See more details on using hashes here.

File details

Details for the file pysimdjson-1.2.1-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: pysimdjson-1.2.1-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 122.4 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8

File hashes

Hashes for pysimdjson-1.2.1-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 f44319aae4b781c2fd6a29ea569343e4da2f736a8e3ff21e4909899f626d033a
MD5 eaf350592d3ed43f0ec26033c308816f
BLAKE2b-256 d025e018dfc806519073bfb4171bc5b2ffb79019196bdb3bac5f47ec46dba79d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page