Skip to main content

simdjson bindings for python

Project description

pysimdjson

Quick-n'dirty Python bindings for simdjson just to see if going down this path might yield some parse time improvements in real-world applications.

These bindings are currently only tested on OS X, but should work everywhere simdjson does although you'll probably have to tweak your build flags.

Installation

There are binary wheels available for OS X 10.12. On other platforms you'll need a C++11-capable compiler.

pip install pysimdjson

or from source:

git clone https://github.com/TkTech/pysimdjson.git
cd pysimdjson
python setup.py install

Example

import pysimdjson

with open('sample.json', 'rb') as fin:
    doc = pysimdjson.loads(fin.read())

However, this doesn't really gain you that much over, say, ujson. You're still loading the entire document and converting the entire thing into a series of Python objects which is very expensive. You can instead use items() to pull only part of a document into Python.

Example document:

{
    "type": "search_results",
    "count": 2,
    "results": [
        {"username": "bob"},
        {"username": "tod"}
    ],
    "error": {
        "message": "All good captain"
    }
}

And now lets try some queries...

import pysimdjson

with open('sample.json', 'rb') as fin:
    # Calling ParsedJson with a document is a shortcut for
    # calling pj.allocate_capacity(<size>) and pj.parse(<doc>). If you're
    # parsing many JSON documents of similar sizes, you can allocate
    # a large buffer just once and keep re-using it instead.
    pj = pysimdjson.ParsedJson(fin.read())

    pj.items('.type') #> "search_results"
    pj.items('.count') #> 2
    pj.items('.results[].username) #> ["bob", "tod"]
    pj.items('.error.message') #> "All good captain"

AVX2

simdjson requires AVX2 support to function. Check to see if your OS/processor supports it:

  • OS X: sysctl -a | grep machdep.cpu.leaf7_features
  • Linux: grep avx2 /proc/cpuinfo

Early Benchmark

Comparing the built-in json module loads on py3.7 to pysimdjson loads.

File json time pysimdjson time
jsonexamples/apache_builds.json 0.09916733999999999 0.074089268
jsonexamples/canada.json 5.305393378 1.6547515810000002
jsonexamples/citm_catalog.json 1.3718639709999998 1.0438697340000003
jsonexamples/github_events.json 0.04840242700000097 0.034239397999998644
jsonexamples/gsoc-2018.json 1.5382746889999996 0.9597240750000005
jsonexamples/instruments.json 0.24350973299999978 0.13639699600000021
jsonexamples/marine_ik.json 4.505123285000002 2.8965093270000004
jsonexamples/mesh.json 1.0325923849999974 0.38916503499999777
jsonexamples/mesh.pretty.json 1.7129034710000006 0.46509220500000126
jsonexamples/numbers.json 0.16577519699999854 0.04843887400000213
jsonexamples/random.json 0.6930746310000018 0.6175370539999996
jsonexamples/twitter.json 0.6069602610000011 0.41049074900000093
jsonexamples/twitterescaped.json 0.7587005720000022 0.41576198399999953
jsonexamples/update-center.json 0.5577604210000011 0.4961777420000004

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysimdjson-1.1.0.tar.gz (56.9 kB view details)

Uploaded Source

Built Distribution

pysimdjson-1.1.0-cp37-cp37m-macosx_10_12_x86_64.whl (123.6 kB view details)

Uploaded CPython 3.7m macOS 10.12+ x86-64

File details

Details for the file pysimdjson-1.1.0.tar.gz.

File metadata

  • Download URL: pysimdjson-1.1.0.tar.gz
  • Upload date:
  • Size: 56.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.0

File hashes

Hashes for pysimdjson-1.1.0.tar.gz
Algorithm Hash digest
SHA256 5aa0f3f9fec236f5cae78b0bd23eb0542c5d69f6e30a1f0b8267acabe47fde7c
MD5 2604e502562c5cf85eaf6cdc23b62adf
BLAKE2b-256 325c4af2f45c12686125f022bb670581de870198a85beb8aae19778341249bf7

See more details on using hashes here.

File details

Details for the file pysimdjson-1.1.0-cp37-cp37m-macosx_10_12_x86_64.whl.

File metadata

  • Download URL: pysimdjson-1.1.0-cp37-cp37m-macosx_10_12_x86_64.whl
  • Upload date:
  • Size: 123.6 kB
  • Tags: CPython 3.7m, macOS 10.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.0

File hashes

Hashes for pysimdjson-1.1.0-cp37-cp37m-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 f95446b813b2c24eaf1b3d3d9c77dadff6dd0049a47605ab39dc667486d2c52e
MD5 083e063d2422b9dfde6b7abc2478e418
BLAKE2b-256 e93a9ce8bc5447ffcb0ab99bbd1f1d34fc31b08478c43bbc6111869047b94275

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page