Skip to main content

simdjson bindings for python

Project description

PyPI - License CircleCI branch AppVeyor branch

pysimdjson

Quick-n'dirty Python bindings for simdjson just to see if going down this path might yield some parse time improvements in real-world applications. So far, the results are promising, especially when only part of a document is of interest.

Bindings are currently tested on OS X, Linux, and Windows.

See the latest documentation at http://pysimdjson.tkte.ch.

Installation

There are binary wheels available for some platforms. On other platforms you'll need a C++17-capable compiler.

pip install pysimdjson

Binary wheels are available for:

Platform py3.4 py3.5 py3.6 py3.7
OS X 10.12 x x x y
Windows x x y y
Linux y y y y

or build from git:

git clone https://github.com/TkTech/pysimdjson.git
cd pysimdjson
python setup.py install

Example

import simdjson

with open('sample.json', 'rb') as fin:
    doc = simdjson.loads(fin.read())

However, this doesn't really gain you that much over, say, ujson. You're still loading the entire document and converting the entire thing into a series of Python objects which is very expensive. You can instead use items() to pull only part of a document into Python.

Example document:

{
    "type": "search_results",
    "count": 2,
    "results": [
        {"username": "bob"},
        {"username": "tod"}
    ],
    "error": {
        "message": "All good captain"
    }
}

And now lets try some queries...

import simdjson

with open('sample.json', 'rb') as fin:
    # Calling ParsedJson with a document is a shortcut for
    # calling pj.allocate_capacity(<size>) and pj.parse(<doc>). If you're
    # parsing many JSON documents of similar sizes, you can allocate
    # a large buffer just once and keep re-using it instead.
    pj = simdjson.ParsedJson(fin.read())

    pj.items('.type') #> "search_results"
    pj.items('.count') #> 2
    pj.items('.results[].username') #> ["bob", "tod"]
    pj.items('.error.message') #> "All good captain"

AVX2

simdjson requires AVX2 support to function. Check to see if your OS/processor supports it:

  • OS X: sysctl -a | grep machdep.cpu.leaf7_features
  • Linux: grep avx2 /proc/cpuinfo

Low-level interface

You can use the low-level simdjson Iterator interface directly, just be aware that this interface can change any time. If you depend on it you should pin to a specific version of simdjson. You may need to use this interface if you're dealing with odd JSON, such as a document with repeated non-unique keys.

with open('sample.json', 'rb') as fin:
    pj = simdjson.ParsedJson(fin.read())
    iter = simdjson.Iterator(pj)
    if iter.is_object():
        if iter.down():
            print(iter.get_string())

Early Benchmark

Comparing the built-in json module loads on py3.7 to simdjson loads.

File json time pysimdjson time
jsonexamples/apache_builds.json 0.09916733999999999 0.074089268
jsonexamples/canada.json 5.305393378 1.6547515810000002
jsonexamples/citm_catalog.json 1.3718639709999998 1.0438697340000003
jsonexamples/github_events.json 0.04840242700000097 0.034239397999998644
jsonexamples/gsoc-2018.json 1.5382746889999996 0.9597240750000005
jsonexamples/instruments.json 0.24350973299999978 0.13639699600000021
jsonexamples/marine_ik.json 4.505123285000002 2.8965093270000004
jsonexamples/mesh.json 1.0325923849999974 0.38916503499999777
jsonexamples/mesh.pretty.json 1.7129034710000006 0.46509220500000126
jsonexamples/numbers.json 0.16577519699999854 0.04843887400000213
jsonexamples/random.json 0.6930746310000018 0.6175370539999996
jsonexamples/twitter.json 0.6069602610000011 0.41049074900000093
jsonexamples/twitterescaped.json 0.7587005720000022 0.41576198399999953
jsonexamples/update-center.json 0.5577604210000011 0.4961777420000004

Getting subsets of the document is significantly faster. For canada.json getting .type using the naive approach and the items() approach, average over N=100.

Python Time
json.loads(canada_json)['type'] 5.76244878
simdjson.loads(canada_json)['type'] 1.5984486990000004
simdjson.ParsedJson(canada_json).items('.type') 0.3949587819999998

This approach avoids creating Python objects for fields that aren't of interest. When you only care about a small part of the document, it will always be faster.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

pysimdjson-1.5.0-py3.7-win-amd64.egg (174.5 kB view details)

Uploaded Source

pysimdjson-1.5.0-py3.6-win-amd64.egg (174.5 kB view details)

Uploaded Source

pysimdjson-1.5.0-cp37-cp37m-win_amd64.whl (177.3 kB view details)

Uploaded CPython 3.7m Windows x86-64

pysimdjson-1.5.0-cp37-cp37m-manylinux1_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.7m

pysimdjson-1.5.0-cp36-cp36m-win_amd64.whl (177.3 kB view details)

Uploaded CPython 3.6m Windows x86-64

pysimdjson-1.5.0-cp36-cp36m-manylinux1_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.6m

pysimdjson-1.5.0-cp35-cp35m-manylinux1_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.5m

pysimdjson-1.5.0-cp34-cp34m-manylinux1_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.4m

File details

Details for the file pysimdjson-1.5.0-py3.7-win-amd64.egg.

File metadata

  • Download URL: pysimdjson-1.5.0-py3.7-win-amd64.egg
  • Upload date:
  • Size: 174.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2

File hashes

Hashes for pysimdjson-1.5.0-py3.7-win-amd64.egg
Algorithm Hash digest
SHA256 30a9b6dbf221fcf30180c05ebcdf92e91080861385027cfbba3683a495ee562d
MD5 511dc92d694bccf24d1c0728231ae475
BLAKE2b-256 15aaebe547a7bf06b41ad14b913b9eb48365c316215fd262b00d4cffcf37dcc5

See more details on using hashes here.

File details

Details for the file pysimdjson-1.5.0-py3.6-win-amd64.egg.

File metadata

  • Download URL: pysimdjson-1.5.0-py3.6-win-amd64.egg
  • Upload date:
  • Size: 174.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8

File hashes

Hashes for pysimdjson-1.5.0-py3.6-win-amd64.egg
Algorithm Hash digest
SHA256 171529e021212e64fa9c1f327aa9a3a102b378b0ff32fa8794cef2a3769c1651
MD5 cdeb0e93269b5bee6669067eb19aee9f
BLAKE2b-256 04cd17fbad9000431415ec61affe55189ad27770cf2a1ceab6dade54624b018a

See more details on using hashes here.

File details

Details for the file pysimdjson-1.5.0-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: pysimdjson-1.5.0-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 177.3 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2

File hashes

Hashes for pysimdjson-1.5.0-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 a707d93b797034bb16b42e38468c671f34a3b6dfe929b10d635fe9c488e12117
MD5 ce3644c6c8f4e79dd61ad7459f17a45d
BLAKE2b-256 ca6f3c46cb371f9dd957a0df568fbf0fa3424be61cfc96a27336f060555daab2

See more details on using hashes here.

File details

Details for the file pysimdjson-1.5.0-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: pysimdjson-1.5.0-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.7.3 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2

File hashes

Hashes for pysimdjson-1.5.0-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 3d31d8449c050f437c839f12d89802e690f0de4cb695518e00e66f43ddcf9998
MD5 855d14851d380c935d81e2dd4a4041c6
BLAKE2b-256 1f052ef46526a928f9954b7f53ab4e5fb5c2d1f1228f2368a46741d13a9c5897

See more details on using hashes here.

File details

Details for the file pysimdjson-1.5.0-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: pysimdjson-1.5.0-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 177.3 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8

File hashes

Hashes for pysimdjson-1.5.0-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 d61fd08a0c58bc46d23226cff819c346b750d6b187d7f08717855c28f8f353e7
MD5 0ba06e6378fa77890cc44bf41fdd351b
BLAKE2b-256 6dccbae14ce26110673ad5f7fc585219632aa85f7e58cd635bcd9880e1e389e5

See more details on using hashes here.

File details

Details for the file pysimdjson-1.5.0-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: pysimdjson-1.5.0-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.7.3 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2

File hashes

Hashes for pysimdjson-1.5.0-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 f907439ad77ebe84c5e1f2918a5da9ceb817251c7f3fa46ab8456c7c77aa77c3
MD5 12ce42cc73813dc7e6482e2e8b61bde4
BLAKE2b-256 cba4bb74832e2999b656c0134a18ab6426134bdad811e9d4b70dbb2367070766

See more details on using hashes here.

File details

Details for the file pysimdjson-1.5.0-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

  • Download URL: pysimdjson-1.5.0-cp35-cp35m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: CPython 3.5m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.7.3 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2

File hashes

Hashes for pysimdjson-1.5.0-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 a8938bcc410ef1a0a116e3178bb6a167d5a44ea0c4558f6889b2c8b93ad91598
MD5 0a79b216369593b5d2f1fe296458e5b1
BLAKE2b-256 ecac28e35afbab26900dadeb21e5f708f1ada60c01ff7a5a984262522e06404e

See more details on using hashes here.

File details

Details for the file pysimdjson-1.5.0-cp34-cp34m-manylinux1_x86_64.whl.

File metadata

  • Download URL: pysimdjson-1.5.0-cp34-cp34m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: CPython 3.4m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.7.3 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2

File hashes

Hashes for pysimdjson-1.5.0-cp34-cp34m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 b40c66ca94038d6efc6b67a0e707affe54f9aa4368f27a6c33e3e8c59b1e0fc0
MD5 cc05721fe0cfda3fdd7187463cdd955a
BLAKE2b-256 b42f67ddadc3ee8ea89cda236b72d6c1b5914da854cecdfd18dacaee9bca31e8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page