Skip to main content

simdjson bindings for python

Project description

PyPI - License CircleCI branch AppVeyor branch

pysimdjson

Quick-n'dirty Python bindings for simdjson just to see if going down this path might yield some parse time improvements in real-world applications. So far, the results are promising, especially when only part of a document is of interest.

Bindings are currently tested on OS X, Linux, and Windows.

See the latest documentation at http://pysimdjson.tkte.ch.

Installation

There are binary wheels available for some platforms. On other platforms you'll need a C++17-capable compiler.

pip install pysimdjson

Binary wheels are available for:

Platform py3.4 py3.5 py3.6 py3.7
OS X 10.12 x x x y
Windows x x y y
Linux y y y y

or build from git:

git clone https://github.com/TkTech/pysimdjson.git
cd pysimdjson
python setup.py install

Example

import simdjson

with open('sample.json', 'rb') as fin:
    doc = simdjson.loads(fin.read())

However, this doesn't really gain you that much over, say, ujson. You're still loading the entire document and converting the entire thing into a series of Python objects which is very expensive. You can instead use items() to pull only part of a document into Python.

Example document:

{
    "type": "search_results",
    "count": 2,
    "results": [
        {"username": "bob"},
        {"username": "tod"}
    ],
    "error": {
        "message": "All good captain"
    }
}

And now lets try some queries...

import simdjson

with open('sample.json', 'rb') as fin:
    # Calling ParsedJson with a document is a shortcut for
    # calling pj.allocate_capacity(<size>) and pj.parse(<doc>). If you're
    # parsing many JSON documents of similar sizes, you can allocate
    # a large buffer just once and keep re-using it instead.
    pj = simdjson.ParsedJson(fin.read())

    pj.items('.type') #> "search_results"
    pj.items('.count') #> 2
    pj.items('.results[].username') #> ["bob", "tod"]
    pj.items('.error.message') #> "All good captain"

AVX2

simdjson requires AVX2 support to function. Check to see if your OS/processor supports it:

  • OS X: sysctl -a | grep machdep.cpu.leaf7_features
  • Linux: grep avx2 /proc/cpuinfo

Low-level interface

You can use the low-level simdjson Iterator interface directly, just be aware that this interface can change any time. If you depend on it you should pin to a specific version of simdjson. You may need to use this interface if you're dealing with odd JSON, such as a document with repeated non-unique keys.

with open('sample.json', 'rb') as fin:
    pj = simdjson.ParsedJson(fin.read())
    iter = simdjson.Iterator(pj)
    if iter.is_object():
        if iter.down():
            print(iter.get_string())

Early Benchmark

Comparing the built-in json module loads on py3.7 to simdjson loads.

File json time pysimdjson time
jsonexamples/apache_builds.json 0.09916733999999999 0.074089268
jsonexamples/canada.json 5.305393378 1.6547515810000002
jsonexamples/citm_catalog.json 1.3718639709999998 1.0438697340000003
jsonexamples/github_events.json 0.04840242700000097 0.034239397999998644
jsonexamples/gsoc-2018.json 1.5382746889999996 0.9597240750000005
jsonexamples/instruments.json 0.24350973299999978 0.13639699600000021
jsonexamples/marine_ik.json 4.505123285000002 2.8965093270000004
jsonexamples/mesh.json 1.0325923849999974 0.38916503499999777
jsonexamples/mesh.pretty.json 1.7129034710000006 0.46509220500000126
jsonexamples/numbers.json 0.16577519699999854 0.04843887400000213
jsonexamples/random.json 0.6930746310000018 0.6175370539999996
jsonexamples/twitter.json 0.6069602610000011 0.41049074900000093
jsonexamples/twitterescaped.json 0.7587005720000022 0.41576198399999953
jsonexamples/update-center.json 0.5577604210000011 0.4961777420000004

Getting subsets of the document is significantly faster. For canada.json getting .type using the naive approach and the items() appraoch, average over N=100.

Python Time
json.loads(canada_json)['type'] 5.76244878
simdjson.loads(canada_json)['type'] 1.5984486990000004
simdjson.ParsedJson(canada_json).items('.type') 0.3949587819999998

This approach avoids creating Python objects for fields that aren't of interest. When you only care about a small part of the document, it will always be faster.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysimdjson-1.4.1.tar.gz (222.7 kB view details)

Uploaded Source

Built Distributions

pysimdjson-1.4.1-py3.7-win-amd64.egg (125.4 kB view details)

Uploaded Source

pysimdjson-1.4.1-py3.6-win-amd64.egg (125.5 kB view details)

Uploaded Source

pysimdjson-1.4.1-cp37-cp37m-win_amd64.whl (128.7 kB view details)

Uploaded CPython 3.7m Windows x86-64

pysimdjson-1.4.1-cp37-cp37m-manylinux1_x86_64.whl (841.1 kB view details)

Uploaded CPython 3.7m

pysimdjson-1.4.1-cp37-cp37m-macosx_10_12_x86_64.whl (135.2 kB view details)

Uploaded CPython 3.7m macOS 10.12+ x86-64

pysimdjson-1.4.1-cp36-cp36m-win_amd64.whl (128.8 kB view details)

Uploaded CPython 3.6m Windows x86-64

pysimdjson-1.4.1-cp36-cp36m-manylinux1_x86_64.whl (841.5 kB view details)

Uploaded CPython 3.6m

pysimdjson-1.4.1-cp35-cp35m-manylinux1_x86_64.whl (828.5 kB view details)

Uploaded CPython 3.5m

pysimdjson-1.4.1-cp34-cp34m-manylinux1_x86_64.whl (830.7 kB view details)

Uploaded CPython 3.4m

File details

Details for the file pysimdjson-1.4.1.tar.gz.

File metadata

  • Download URL: pysimdjson-1.4.1.tar.gz
  • Upload date:
  • Size: 222.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.0

File hashes

Hashes for pysimdjson-1.4.1.tar.gz
Algorithm Hash digest
SHA256 d96b8c857c6009378fbd7f90bddfd5183b3e5426fa305cae9f473f470e3ad12d
MD5 cd3813d36ff4d328d0f1283755646aac
BLAKE2b-256 577823d10cc8486724e05d5ff7df7fed05fdedaf9b005a21d71c636696226c11

See more details on using hashes here.

File details

Details for the file pysimdjson-1.4.1-py3.7-win-amd64.egg.

File metadata

  • Download URL: pysimdjson-1.4.1-py3.7-win-amd64.egg
  • Upload date:
  • Size: 125.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2

File hashes

Hashes for pysimdjson-1.4.1-py3.7-win-amd64.egg
Algorithm Hash digest
SHA256 6fc60daf92a2c03298551320965fe94f158f325e3fac1b6653acde7a5ef68c46
MD5 cefe394b0188ccb66589c8c13268681f
BLAKE2b-256 c6afbd9f58d38c8b1e04df3a393197978bb0e78012f5a94428cbbf1e6a6247bb

See more details on using hashes here.

File details

Details for the file pysimdjson-1.4.1-py3.6-win-amd64.egg.

File metadata

  • Download URL: pysimdjson-1.4.1-py3.6-win-amd64.egg
  • Upload date:
  • Size: 125.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8

File hashes

Hashes for pysimdjson-1.4.1-py3.6-win-amd64.egg
Algorithm Hash digest
SHA256 8047c0d727d9be0910fb572245ede614e19c3b84eb7956792647e741bf6141d5
MD5 a1410def7257d44a898eaf19afecd8b7
BLAKE2b-256 abd775657386222d8c2fc6be5ba5ea38a5ee5fb6b1b1da9d4254afec070db5b6

See more details on using hashes here.

File details

Details for the file pysimdjson-1.4.1-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: pysimdjson-1.4.1-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 128.7 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2

File hashes

Hashes for pysimdjson-1.4.1-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 2b28937f680b386d6cb5789d34c91943e4d5ed1487fd3bbaada454c7c2a0eecf
MD5 9824edad69a6cf395aee1e1e85eec64e
BLAKE2b-256 042622e5e5f002cb641bf2bf4db9587c53c9f8d3961ba402d608bdb0ef5d0771

See more details on using hashes here.

File details

Details for the file pysimdjson-1.4.1-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: pysimdjson-1.4.1-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 841.1 kB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.7.3 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2

File hashes

Hashes for pysimdjson-1.4.1-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 f002ad4f1c81e369cb493426696c1f61a73b7c8c1be93e97685e89479eac03a4
MD5 c3b4fec2f9e81e2501228bfda8f4b27a
BLAKE2b-256 f8906e5debb2fb1be7a105d61803a53612caf4aca46b370b09102282c5fae678

See more details on using hashes here.

File details

Details for the file pysimdjson-1.4.1-cp37-cp37m-macosx_10_12_x86_64.whl.

File metadata

  • Download URL: pysimdjson-1.4.1-cp37-cp37m-macosx_10_12_x86_64.whl
  • Upload date:
  • Size: 135.2 kB
  • Tags: CPython 3.7m, macOS 10.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.0

File hashes

Hashes for pysimdjson-1.4.1-cp37-cp37m-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 51038856d00c2cc2ceb5a8a43eb4c76bf77ac91fb2c3d3a53efaece4a49859bb
MD5 2930ec3e38b874393697589caceaa568
BLAKE2b-256 79ec59d0e404d53deb0fde60a54e972e39811813ab32e7810db898a9fd748326

See more details on using hashes here.

File details

Details for the file pysimdjson-1.4.1-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: pysimdjson-1.4.1-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 128.8 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8

File hashes

Hashes for pysimdjson-1.4.1-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 bcdaca430981864b716c62629e9e26e84a9acd8fc287833648e9d727d310d5d9
MD5 51bfd6631d059141a5591e6a8a6da9fa
BLAKE2b-256 0e65176f707609113f16bb958261553e304032f02d2385ceb3ce909019bac7c1

See more details on using hashes here.

File details

Details for the file pysimdjson-1.4.1-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: pysimdjson-1.4.1-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 841.5 kB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.7.3 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2

File hashes

Hashes for pysimdjson-1.4.1-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 20f2bd413338b13f354c81531a93338c7b48551aba54140b1f84a9522b67d88d
MD5 9721e5f58a9ecb934b2fafbc5baab1fb
BLAKE2b-256 c1679b032077ce998740b6f132b49f55c2c705b1125cd3d1c8c66633f8fda4cc

See more details on using hashes here.

File details

Details for the file pysimdjson-1.4.1-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

  • Download URL: pysimdjson-1.4.1-cp35-cp35m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 828.5 kB
  • Tags: CPython 3.5m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.7.3 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2

File hashes

Hashes for pysimdjson-1.4.1-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 38b3f2407d35019c268e60244f275a22b71c6ae1a1aa8635b25863e5bc0a4bac
MD5 49768bf7cf0856a1ca6feda8db1c1ad4
BLAKE2b-256 1c9b671095cfec59157fe378c49c1d589a3564125812c9b8ddc4395ba102dbcd

See more details on using hashes here.

File details

Details for the file pysimdjson-1.4.1-cp34-cp34m-manylinux1_x86_64.whl.

File metadata

  • Download URL: pysimdjson-1.4.1-cp34-cp34m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 830.7 kB
  • Tags: CPython 3.4m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.7.3 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2

File hashes

Hashes for pysimdjson-1.4.1-cp34-cp34m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 8a75580b2d6d2339a93c1337eae2ded99c3bcd890d68e77f5ffb3572e8a63d56
MD5 8793920bebd97a8b00355feaf46e1afc
BLAKE2b-256 a6909ade24d3cf6a2aec74e56cf96280c4fd09f5891be56c2f92e5ca783eee2b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page