Streaming newline delimited JSON I/O
Project description
Streaming newline delimited JSON I/O.
Examples
Read and write files with a single JSON object on every line. See the sample-data directory for valid input examples.
One dictionary per line:
from pprint import pprint
import newlinejson
with open('sample-data/dictionaries.json') as i_f, open('outfile.json', 'r+') as o_f:
writer = newlinejson.Writer(o_f)
for line in newlinejson.Reader(i_f):
writer.write(line)
o_f.seek(0)
pprint(newlinejson.load(o_f))
[{'field2': 'l1f2', 'field3': 'l1f3', 'field1': 'l1f1'}
{'field2': 'l2f2', 'field3': 'l3f3', 'field1': 'l2f1'}
{'field2': 'l3f2', 'field3': 'l3f3', 'field1': 'l3f1'}
{'field2': 'l4f2', 'field3': 'l4f3', 'field1': 'l4f1'}
{'field2': 'l5f2', 'field3': 'l5f3', 'field1': 'l5f1'}]
One list per line:
import newlinejson
with open('sample-data/lists-no-header.json') as f:
for line in newlinejson.Reader(f):
print(line)
['l1f2', 'l1f3', 'l1f1']
['l2f2', 'l3f3', 'l2f1']
['l3f2', 'l3f3', 'l3f1']
['l4f2', 'l4f3', 'l4f1']
['l5f2', 'l5f3', 'l5f1']
Mixed content:
import newlinejson
with open('sample-data/mixed-content.json') as f:
for line in newlinejson.Reader(f):
print(line)
{'field2': 'l1f2', 'field3': 'l1f3', 'field1': 'l1f1'}
['l1f2', 'l1f3', 'l1f1']
{'field2': 'l2f2', 'field3': 'l3f3', 'field1': 'l2f1'}
['l2f2', 'l3f3', 'l2f1']
{'field2': 'l3f2', 'field3': 'l3f3', 'field1': 'l3f1'}
['l3f2', 'l3f3', 'l3f1']
{'field2': 'l4f2', 'field3': 'l4f3', 'field1': 'l4f1'}
['l4f2', 'l4f3', 'l4f1']
{'field2': 'l5f2', 'field3': 'l5f3', 'field1': 'l5f1'}
['l5f2', 'l5f3', 'l5f1']
The standard JSON functions load/s() and dump/s() are still available but should only be used on small files. The load/s() functions return lists of JSON objects and dump/s() take the the same format as input.
Load from a file:
from pprint import pprint
import newlinejson
with open('sample-data/dictionaries.json') as f:
pprint(newlinejson.load(f))
[{'field2': 'l1f2', 'field3': 'l1f3', 'field1': 'l1f1'},
{'field2': 'l2f2', 'field3': 'l3f3', 'field1': 'l2f1'},
{'field2': 'l3f2', 'field3': 'l3f3', 'field1': 'l3f1'},
{'field2': 'l4f2', 'field3': 'l4f3', 'field1': 'l4f1'},
{'field2': 'l5f2', 'field3': 'l5f3', 'field1': 'l5f1'}]
Load from a string:
from pprint import pprint
import newlinejson
with open('sample-data/dictionaries.json') as f:
pprint(newlinejson.loads(f.read()))
[{'field2': 'l1f2', 'field3': 'l1f3', 'field1': 'l1f1'},
{'field2': 'l2f2', 'field3': 'l3f3', 'field1': 'l2f1'},
{'field2': 'l3f2', 'field3': 'l3f3', 'field1': 'l3f1'},
{'field2': 'l4f2', 'field3': 'l4f3', 'field1': 'l4f1'},
{'field2': 'l5f2', 'field3': 'l5f3', 'field1': 'l5f1'}]
Dump to a file or a string:
from pprint import pprint
import newlinejson
lines = [
{'field2': 'l1f2', 'field3': 'l1f3', 'field1': 'l1f1'},
{'field2': 'l2f2', 'field3': 'l3f3', 'field1': 'l2f1'},
{'field2': 'l3f2', 'field3': 'l3f3', 'field1': 'l3f1'},
{'field2': 'l4f2', 'field3': 'l4f3', 'field1': 'l4f1'},
{'field2': 'l5f2', 'field3': 'l5f3', 'field1': 'l5f1'}
]
with open('output.json', 'r+') as f:
newlinejson.dump(lines, f)
f.seek(0)
pprint(newlinejson.dumps(f.read()))
[{'field2': 'l1f2', 'field3': 'l1f3', 'field1': 'l1f1'},
{'field2': 'l2f2', 'field3': 'l3f3', 'field1': 'l2f1'},
{'field2': 'l3f2', 'field3': 'l3f3', 'field1': 'l3f1'},
{'field2': 'l4f2', 'field3': 'l4f3', 'field1': 'l4f1'},
{'field2': 'l5f2', 'field3': 'l5f3', 'field1': 'l5f1'}]
Dependencies
NewlineJSON has no dependencies but if Python’s built-in JSON library is too slow it can be used in conjunction with a 3rd party library like ujson or simplejson. When available all unittests are run against json, ujson, simplejson, yajl, and jsonlib2. The internal JSOn library can be specified like so:
import newlinejson
import ujson
newlinejson.JSON = ujson
with open('sample-data/dictionaries.json') as f:
reader = newlinejson.Reader(f)
print(reader.json_lib.__name__)
ujson
The library can also be specified for load/s(), dump/s() Reader and Writer via a json_lib keyword argument:
from pprint import pprint
import newlinejson
import ujson
with open('sample-data/dictionaries.json') as f:
reader = newlinejson.Reader(f, json_lib=ujson)
print(reader.json_lib.__name__)
ujson
with open('sample-data/dictionaries.json') as f:
pprint(newlinejson.load(f, json_lib=ujson))
[{'field1': 'l1f1', 'field2': 'l1f2', 'field3': 'l1f3'},
{'field1': 'l2f1', 'field2': 'l2f2', 'field3': 'l2f3'},
{'field1': 'l3f1', 'field2': 'l3f2', 'field3': 'l3f3'},
{'field1': 'l4f1', 'field2': 'l4f2', 'field3': 'l4f3'},
{'field1': 'l5f1', 'field2': 'l5f2', 'field3': 'l5f3'}]
Installing
Via pip:
$ pip install newlinejson
From master:
$ git clone https://github.com/geowurster/NewlineJSON.git
$ cd NewlineJSON
$ python setup.py install
Developing
Install:
$ pip install virtualenv
$ git clone https://github.com/geowurster/NewlineJSON
$ cd NewlineJSON
$ virtualenv venv
$ source venv/bin/activate
$ pip install -e .
$ nosetests --with-coverage
Profiling
Attempts to profile against: json, jsonlib2, simplejson, ujson, and yajl. A small-ish file is used by default from sample-data but the user can specify any newline delimited JSON file input file as the first argument.
$ ./utils/profile.py
Profiling json ...
Start time: 23:25:47
End time: 23:25:49
Elapsed secs: 1.654891
Num rows: 10000
Profiling jsonlib2 ...
Start time: 23:25:49
End time: 23:25:52
Elapsed secs: 2.780862
Num rows: 10000
Profiling simplejson ...
Start time: 23:25:52
End time: 23:25:55
Elapsed secs: 2.905002
Num rows: 10000
Profiling ujson ...
Start time: 23:25:55
End time: 23:25:56
Elapsed secs: 0.927346
Num rows: 10000
Profiling yajl ...
Start time: 23:25:56
End time: 23:25:58
Elapsed secs: 2.620200
Num rows: 10000
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file NewlineJSON-0.2.tar.gz
.
File metadata
- Download URL: NewlineJSON-0.2.tar.gz
- Upload date:
- Size: 6.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b4062a16d4340b63f8272d598e948334e4a13fb7a4f9300c2be6be2413aa626f |
|
MD5 | 9c39a48382bfc8854f18c716b16d026d |
|
BLAKE2b-256 | 86684f36ef06ec843db0960d3fd3371af21504554a274b4759c122fb1f5d8f50 |