Skip to main content

simdjson bindings for python

Project description

PyPI - License CircleCI branch AppVeyor branch

pysimdjson

Quick-n'dirty Python bindings for simdjson just to see if going down this path might yield some parse time improvements in real-world applications. So far, the results are promising, especially when only part of a document is of interest.

Bindings are currently tested on OS X, Linux, and Windows.

See the latest documentation at http://pysimdjson.tkte.ch.

Installation

There are binary wheels available for py3.6/py3.7 on OS X 10.12 & Windows. On other platforms you'll need a C++17-capable compiler.

pip install pysimdjson

If you're getting errors when installing from pip, there's probably no binary package available for your combination of platform & python version. As long as you have a C++17 compiler installed you can still use pip, you just need to provide a few extra compiler flags. The most common are:

  • gcc/clang: CFLAGS="-march=native -std=c++17" pip install pysimdjson

  • msvc (Visual Studio 2017):

    SET CL="/std:c++17 /arch:AVX2"
    pip install pysimdjson
    

or from git:

git clone https://github.com/TkTech/pysimdjson.git
cd pysimdjson
python setup.py install

Example

import simdjson

with open('sample.json', 'rb') as fin:
    doc = simdjson.loads(fin.read())

However, this doesn't really gain you that much over, say, ujson. You're still loading the entire document and converting the entire thing into a series of Python objects which is very expensive. You can instead use items() to pull only part of a document into Python.

Example document:

{
    "type": "search_results",
    "count": 2,
    "results": [
        {"username": "bob"},
        {"username": "tod"}
    ],
    "error": {
        "message": "All good captain"
    }
}

And now lets try some queries...

import simdjson

with open('sample.json', 'rb') as fin:
    # Calling ParsedJson with a document is a shortcut for
    # calling pj.allocate_capacity(<size>) and pj.parse(<doc>). If you're
    # parsing many JSON documents of similar sizes, you can allocate
    # a large buffer just once and keep re-using it instead.
    pj = simdjson.ParsedJson(fin.read())

    pj.items('.type') #> "search_results"
    pj.items('.count') #> 2
    pj.items('.results[].username') #> ["bob", "tod"]
    pj.items('.error.message') #> "All good captain"

AVX2

simdjson requires AVX2 support to function. Check to see if your OS/processor supports it:

  • OS X: sysctl -a | grep machdep.cpu.leaf7_features
  • Linux: grep avx2 /proc/cpuinfo

Low-level interface

You can use the low-level simdjson Iterator interface directly, just be aware that this interface can change any time. If you depend on it you should pin to a specific version of simdjson. You may need to use this interface if you're dealing with odd JSON, such as a document with repeated non-unique keys.

with open('sample.json', 'rb') as fin:
    pj = simdjson.ParsedJson(fin.read())
    iter = simdjson.Iterator(pj)
    if iter.is_object():
        if iter.down():
            print(iter.get_string())

Early Benchmark

Comparing the built-in json module loads on py3.7 to simdjson loads.

File json time pysimdjson time
jsonexamples/apache_builds.json 0.09916733999999999 0.074089268
jsonexamples/canada.json 5.305393378 1.6547515810000002
jsonexamples/citm_catalog.json 1.3718639709999998 1.0438697340000003
jsonexamples/github_events.json 0.04840242700000097 0.034239397999998644
jsonexamples/gsoc-2018.json 1.5382746889999996 0.9597240750000005
jsonexamples/instruments.json 0.24350973299999978 0.13639699600000021
jsonexamples/marine_ik.json 4.505123285000002 2.8965093270000004
jsonexamples/mesh.json 1.0325923849999974 0.38916503499999777
jsonexamples/mesh.pretty.json 1.7129034710000006 0.46509220500000126
jsonexamples/numbers.json 0.16577519699999854 0.04843887400000213
jsonexamples/random.json 0.6930746310000018 0.6175370539999996
jsonexamples/twitter.json 0.6069602610000011 0.41049074900000093
jsonexamples/twitterescaped.json 0.7587005720000022 0.41576198399999953
jsonexamples/update-center.json 0.5577604210000011 0.4961777420000004

Getting subsets of the document is significantly faster. For canada.json getting .type using the naive approach and the items() appraoch, average over N=100.

Python Time
json.loads(canada_json)['type'] 5.76244878
simdjson.loads(canada_json)['type'] 1.5984486990000004
simdjson.ParsedJson(canada_json).items('.type') 0.3949587819999998

This approach avoids creating Python objects for fields that aren't of interest. When you only care about a small part of the document, it will always be faster.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysimdjson-1.3.0.tar.gz (211.7 kB view details)

Uploaded Source

Built Distributions

pysimdjson-1.3.0-py3.7-win-amd64.egg (120.5 kB view details)

Uploaded Source

pysimdjson-1.3.0-py3.6-win-amd64.egg (120.6 kB view details)

Uploaded Source

pysimdjson-1.3.0-cp37-cp37m-win_amd64.whl (124.4 kB view details)

Uploaded CPython 3.7m Windows x86-64

pysimdjson-1.3.0-cp37-cp37m-macosx_10_12_x86_64.whl (129.5 kB view details)

Uploaded CPython 3.7m macOS 10.12+ x86-64

pysimdjson-1.3.0-cp36-cp36m-win_amd64.whl (124.5 kB view details)

Uploaded CPython 3.6m Windows x86-64

File details

Details for the file pysimdjson-1.3.0.tar.gz.

File metadata

  • Download URL: pysimdjson-1.3.0.tar.gz
  • Upload date:
  • Size: 211.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.0

File hashes

Hashes for pysimdjson-1.3.0.tar.gz
Algorithm Hash digest
SHA256 ec62d1e061366fc09612144c2de2c936bc80661639042b0f88fdb2189decc767
MD5 3da5df43c4ade3bde4430cd82f27250a
BLAKE2b-256 89f2dc4dd7687b4d3a3f5bdfd2839c3789a5eefdb20b7ed9da0bf5c294988dfd

See more details on using hashes here.

File details

Details for the file pysimdjson-1.3.0-py3.7-win-amd64.egg.

File metadata

  • Download URL: pysimdjson-1.3.0-py3.7-win-amd64.egg
  • Upload date:
  • Size: 120.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2

File hashes

Hashes for pysimdjson-1.3.0-py3.7-win-amd64.egg
Algorithm Hash digest
SHA256 ce304e21530f3cefddce901d4f6c3bcc0be9face4d30d4a0154b5ed776c58da0
MD5 b8f1ae2d5f6908a0194a86736b710549
BLAKE2b-256 3a3f1a5971df9fd8e46ff9d60bbaeae99406c6f46f53f397a46e469d18e4733b

See more details on using hashes here.

File details

Details for the file pysimdjson-1.3.0-py3.6-win-amd64.egg.

File metadata

  • Download URL: pysimdjson-1.3.0-py3.6-win-amd64.egg
  • Upload date:
  • Size: 120.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8

File hashes

Hashes for pysimdjson-1.3.0-py3.6-win-amd64.egg
Algorithm Hash digest
SHA256 a03ae929cc3422f15fd30c160d44d80ada88c144dce22bc5c66f2aff40db3eed
MD5 6c017b383981de5bd8d1074f3450968a
BLAKE2b-256 f8dcea4a082807abc67f3697d6efc3ace4d944be659a6edf02a7602626866111

See more details on using hashes here.

File details

Details for the file pysimdjson-1.3.0-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: pysimdjson-1.3.0-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 124.4 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2

File hashes

Hashes for pysimdjson-1.3.0-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 5d7c32ba556aec28f34e4df2e86f88cfebde8666ff392ec80396ea137276358e
MD5 87a8e1b9d5f01bd904fcf15241a1b7a2
BLAKE2b-256 8c4b9161ba735dde8dc9d0d698762f4453624fce88feca6e38f0b95c176ea33a

See more details on using hashes here.

File details

Details for the file pysimdjson-1.3.0-cp37-cp37m-macosx_10_12_x86_64.whl.

File metadata

  • Download URL: pysimdjson-1.3.0-cp37-cp37m-macosx_10_12_x86_64.whl
  • Upload date:
  • Size: 129.5 kB
  • Tags: CPython 3.7m, macOS 10.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.0

File hashes

Hashes for pysimdjson-1.3.0-cp37-cp37m-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 31edb207c799364e0b4b9429b9e9e21bae04e98e461afa833583d6b245d67b0e
MD5 9a3ed5ae20028852eec826c6dae432f0
BLAKE2b-256 521b6eb02ff9c736ff7031527b3ce9c513e7a31bbc8024af5db925cbf7145862

See more details on using hashes here.

File details

Details for the file pysimdjson-1.3.0-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: pysimdjson-1.3.0-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 124.5 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8

File hashes

Hashes for pysimdjson-1.3.0-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 dca9ed97fcccad71463e5d874da113964977e332a63e0466863d6bba1f227238
MD5 8f0564bdb414974516eae717a5eae8c9
BLAKE2b-256 b965340c7be94abbf4a246899e0e35355e10072d67800355934d1e1e60226979

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page