Python bindings for smidjson, using libpy
Project description
libpy Simdjson
Status: Working Alpha
Python bindings for simdjson using libpy.
Requirements
- OS: macOS>10.15, linux.
- Compiler: gcc>=9, clang >= 10 (C++17 code)
- Python: libpy>=0.2.3, numpy.
Usage
from pathlib import Path
import libpy_simdjson as json
doc = json.load(Path("twitter.json"))
# or json.load(b"twitter.json")
# or json.load("twitter.json")
# we also support `loads` for strings.
doc
is an Object
. Objects act as python dicts with special methods.
isinstance(doc, json.Object)
True
We can grab keys, get the length, grab items, and access specific keys:
len(doc)
2
doc.keys()
[b'statuses', b'search_metadata']
doc[b'search_metadata'].items()
[(b'completed_in', 0.087),
(b'max_id', 505874924095815700),
(b'max_id_str', b'505874924095815681'),
(b'next_results',
b'?max_id=505874847260352512&q=%E4%B8%80&count=100&include_entities=1'),
(b'query', b'%E4%B8%80'),
(b'refresh_url',
b'?since_id=505874924095815681&q=%E4%B8%80&include_entities=1'),
(b'count', 100),
(b'since_id', 0),
(b'since_id_str', b'0')]
If you every want an actual python dictionary, use as_dict
:
doc[b'search_metadata'].as_dict()
{b'completed_in': 0.087,
b'max_id': 505874924095815700,
b'max_id_str': b'505874924095815681',
b'next_results': b'?max_id=505874847260352512&q=%E4%B8%80&count=100&include_entities=1',
b'query': b'%E4%B8%80',
b'refresh_url': b'?since_id=505874924095815681&q=%E4%B8%80&include_entities=1',
b'count': 100,
b'since_id': 0,
b'since_id_str': b'0'}
However, we also support JSON Pointer sytnax via at
. This will be much faster if you know what you're looking for:
doc.at(b"statuses/50/created_at")
b'Sun Aug 31 00:29:04 +0000 2014'
doc.at(b"statuses/50/text").decode()
'RT @Ang_Angel73: 逢坂「くっ…僕の秘められし右目が…!」\n一同「……………。」'
Let's look at statuses
statuses = doc[b'statuses']
statuses
is an Array
. Arrays act like python lists with special methods.
Note: statuses
and doc
share a single parser instance. We cannot parse a new document while these objects are alive (though we can create new parsers via libpy_simdjson.Parser.load
.
isinstance(statuses, json.Array)
True
Arrays support length, indexing, iteration:
len(statuses)
100
statuses[0][b'text'].decode()
'@aym0566x \n\n名前:前田あゆみ\n第一印象:なんか怖っ!\n今の印象:とりあえずキモい。噛み合わない\n好きなところ:ぶすでキモいとこ😋✨✨\n思い出:んーーー、ありすぎ😊❤️\nLINE交換できる?:あぁ……ごめん✋\nトプ画をみて:照れますがな😘✨\n一言:お前は一生もんのダチ💖'
for status in statuses:
# this is a bad example but you get the picture
if status[b'id'] % 2 == 0:
print(status[b"text"].decode())
break
else:
print("no even ids?")
@aym0566x
名前:前田あゆみ
第一印象:なんか怖っ!
今の印象:とりあえずキモい。噛み合わない
好きなところ:ぶすでキモいとこ😋✨✨
思い出:んーーー、ありすぎ😊❤️
LINE交換できる?:あぁ……ごめん✋
トプ画をみて:照れますがな😘✨
一言:お前は一生もんのダチ💖
If you need to you can convert and Array to a list using as_list
:
statuses.as_list()[1][b'metadata']
{b'result_type': b'recent', b'iso_language_code': b'ja'}
However, just like for Objects, we support JSON Pointers via at
, which is much faster:
statuses.at(b"33/created_at")
b'Sun Aug 31 00:29:06 +0000 2014'
Benchmarks
---------------------------------------------- benchmark 'Load /home/runner/work/libpy_simdjson/libpy_simdjson/libpy_simdjson/tests/jsonexamples/canada.json': 6 tests ----------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_load[path0-libpy_simdjson-loads] 3.4478 (1.0) 10.1485 (1.0) 4.0615 (1.0) 0.6386 (1.0) 3.9595 (1.0) 0.3985 (1.0) 8;6 246.2156 (1.0) 149 1
test_benchmark_load[path0-orjson-loads] 14.7421 (4.28) 31.9980 (3.15) 21.1131 (5.20) 4.7609 (7.45) 21.8631 (5.52) 8.2455 (20.69) 23;0 47.3639 (0.19) 61 1
test_benchmark_load[path0-pysimdjson-loads] 15.5617 (4.51) 30.0839 (2.96) 22.2207 (5.47) 4.3227 (6.77) 23.6153 (5.96) 8.4906 (21.31) 12;0 45.0031 (0.18) 30 1
test_benchmark_load[path0-ujson-loads] 20.0784 (5.82) 37.2904 (3.67) 27.4904 (6.77) 4.6357 (7.26) 27.7301 (7.00) 8.1542 (20.46) 9;0 36.3763 (0.15) 26 1
test_benchmark_load[path0-rapidjson-loads] 44.7989 (12.99) 69.9204 (6.89) 53.8819 (13.27) 6.2806 (9.83) 54.5078 (13.77) 10.5220 (26.40) 6;0 18.5591 (0.08) 20 1
test_benchmark_load[path0-python_json-loads] 45.6048 (13.23) 58.9150 (5.81) 52.6407 (12.96) 4.2356 (6.63) 53.2421 (13.45) 7.6745 (19.26) 9;0 18.9967 (0.08) 21 1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------ benchmark 'Load /home/runner/work/libpy_simdjson/libpy_simdjson/libpy_simdjson/tests/jsonexamples/citm_catalog.json': 6 tests -------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_load[path3-libpy_simdjson-loads] 973.0290 (1.0) 1,696.1500 (1.0) 1,106.7939 (1.0) 70.3023 (1.0) 1,096.5330 (1.0) 55.0015 (1.0) 107;65 903.5106 (1.0) 496 1
test_benchmark_load[path3-orjson-loads] 6,271.9950 (6.45) 18,752.0820 (11.06) 9,199.1053 (8.31) 3,332.8687 (47.41) 7,502.8330 (6.84) 3,940.9760 (71.65) 32;1 108.7062 (0.12) 128 1
test_benchmark_load[path3-pysimdjson-loads] 7,448.6360 (7.66) 21,308.7680 (12.56) 10,668.5839 (9.64) 3,595.1711 (51.14) 8,919.9800 (8.13) 1,307.4410 (23.77) 24;24 93.7332 (0.10) 102 1
test_benchmark_load[path3-ujson-loads] 7,774.9390 (7.99) 17,898.5500 (10.55) 10,364.6843 (9.36) 3,222.6374 (45.84) 8,751.2690 (7.98) 1,562.5480 (28.41) 26;26 96.4815 (0.11) 115 1
test_benchmark_load[path3-python_json-loads] 11,643.7470 (11.97) 23,959.7150 (14.13) 15,714.9961 (14.20) 3,806.9531 (54.15) 13,973.4170 (12.74) 6,292.6375 (114.41) 12;0 63.6335 (0.07) 41 1
test_benchmark_load[path3-rapidjson-loads] 13,983.3210 (14.37) 27,216.4270 (16.05) 17,630.6505 (15.93) 4,016.1918 (57.13) 15,564.2690 (14.19) 2,136.0153 (38.84) 15;15 56.7194 (0.06) 65 1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------- benchmark 'Load /home/runner/work/libpy_simdjson/libpy_simdjson/libpy_simdjson/tests/jsonexamples/github_events.json': 6 tests ------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_load[path2-libpy_simdjson-loads] 31.8010 (1.0) 5,766.8830 (6.11) 37.5110 (1.0) 59.9135 (1.24) 37.0010 (1.0) 0.2000 (1.0) 9;3552 26.6588 (1.0) 9200 1
test_benchmark_load[path2-orjson-loads] 229.6080 (7.22) 4,736.2550 (5.02) 266.4467 (7.10) 94.5404 (1.96) 266.1090 (7.19) 40.8512 (204.26) 56;75 3.7531 (0.14) 3243 1
test_benchmark_load[path2-pysimdjson-loads] 291.1090 (9.15) 1,112.7370 (1.18) 340.7878 (9.09) 48.2980 (1.0) 336.6110 (9.10) 33.8510 (169.25) 214;48 2.9344 (0.11) 2187 1
test_benchmark_load[path2-ujson-loads] 300.1100 (9.44) 4,311.1400 (4.57) 342.2005 (9.12) 93.3709 (1.93) 346.5110 (9.36) 50.4020 (252.01) 26;36 2.9223 (0.11) 2258 1
test_benchmark_load[path2-rapidjson-loads] 379.0120 (11.92) 4,312.8390 (4.57) 518.6963 (13.83) 117.7450 (2.44) 507.6160 (13.72) 51.0268 (255.13) 37;40 1.9279 (0.07) 1717 1
test_benchmark_load[path2-python_json-loads] 382.2120 (12.02) 943.6300 (1.0) 439.8152 (11.72) 50.1689 (1.04) 443.7140 (11.99) 82.9020 (414.51) 665;18 2.2737 (0.09) 1894 1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------- benchmark 'Load /home/runner/work/libpy_simdjson/libpy_simdjson/libpy_simdjson/tests/jsonexamples/mesh.json': 6 tests --------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_load[path4-libpy_simdjson-loads] 993.7280 (1.0) 2,153.3610 (1.0) 1,113.0914 (1.0) 125.6128 (1.0) 1,122.9820 (1.0) 147.0050 (1.0) 64;16 898.3988 (1.0) 898 1
test_benchmark_load[path4-pysimdjson-loads] 3,019.2900 (3.04) 13,713.0090 (6.37) 3,958.4115 (3.56) 1,763.1884 (14.04) 3,619.4070 (3.22) 300.4090 (2.04) 10;14 252.6266 (0.28) 226 1
test_benchmark_load[path4-orjson-loads] 3,075.6900 (3.10) 12,985.8830 (6.03) 4,371.5742 (3.93) 1,528.5850 (12.17) 4,067.1200 (3.62) 444.3125 (3.02) 10;14 228.7506 (0.25) 240 1
test_benchmark_load[path4-ujson-loads] 3,947.6150 (3.97) 13,696.0010 (6.36) 4,954.1335 (4.45) 1,521.1764 (12.11) 4,690.3375 (4.18) 390.0120 (2.65) 8;9 201.8516 (0.22) 218 1
test_benchmark_load[path4-python_json-loads] 7,593.0170 (7.64) 19,002.5420 (8.82) 9,068.5910 (8.15) 1,944.1363 (15.48) 8,763.6505 (7.80) 649.6190 (4.42) 5;5 110.2707 (0.12) 122 1
test_benchmark_load[path4-rapidjson-loads] 8,291.5380 (8.34) 19,017.8470 (8.83) 9,628.5255 (8.65) 1,797.5745 (14.31) 9,276.3670 (8.26) 872.3250 (5.93) 4;4 103.8581 (0.12) 102 1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------- benchmark 'Load /home/runner/work/libpy_simdjson/libpy_simdjson/libpy_simdjson/tests/jsonexamples/twitter.json': 6 tests -------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_load[path1-libpy_simdjson-loads] 374.2130 (1.0) 10,169.1400 (1.0) 445.6502 (1.0) 237.7491 (1.0) 443.3150 (1.0) 66.3020 (1.0) 19;29 2,243.9125 (1.0) 1790 1
test_benchmark_load[path1-orjson-loads] 2,788.1970 (7.45) 11,687.4110 (1.15) 3,351.3276 (7.52) 1,117.1151 (4.70) 3,198.9625 (7.22) 351.0120 (5.29) 10;12 298.3892 (0.13) 294 1
test_benchmark_load[path1-ujson-loads] 3,312.1150 (8.85) 12,571.4370 (1.24) 3,973.3347 (8.92) 1,221.4127 (5.14) 3,805.8815 (8.59) 447.3170 (6.75) 7;9 251.6778 (0.11) 258 1
test_benchmark_load[path1-pysimdjson-loads] 3,586.0280 (9.58) 18,704.8590 (1.84) 4,553.9661 (10.22) 1,772.5065 (7.46) 4,182.3480 (9.43) 331.1612 (4.99) 7;17 219.5888 (0.10) 169 1
test_benchmark_load[path1-python_json-loads] 4,573.6530 (12.22) 13,900.1650 (1.37) 5,396.5765 (12.11) 1,236.4753 (5.20) 5,222.7750 (11.78) 554.0430 (8.36) 6;7 185.3027 (0.08) 189 1
test_benchmark_load[path1-rapidjson-loads] 5,447.2870 (14.56) 16,226.5570 (1.60) 6,506.3766 (14.60) 1,495.7694 (6.29) 6,322.1140 (14.26) 544.9407 (8.22) 6;7 153.6954 (0.07) 165 1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Legend:
Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
OPS: Operations Per Second, computed as 1 / Mean
================== 71 passed, 1 xfailed, 1 warning in 29.65s ===================
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file libpy_simdjson-0.1.0.tar.gz
.
File metadata
- Download URL: libpy_simdjson-0.1.0.tar.gz
- Upload date:
- Size: 488.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c8e8f59c3d26dbab84ffb0b0bd98f062419a8a64bae17884d4d0a932f19e682e |
|
MD5 | 4aba0401b0a9e2aa0d9c8bc2edc18d8f |
|
BLAKE2b-256 | 25436d041cab8462e0501dc848a5893b91296f177ed6c7eacae6d0232f8ba714 |