basic streaming text processing
Project description
It’s like sed, but Python!
This project is actively being developed but don’t get too attached to any portions of the API or commandline syntax until v1.0.
Examples
See the Cookbook for more examples.
Change newline character in a CSV.
$ more sample-data/csv-with-header.csv \
| pyin "line.replace('\n', '\r\n')" > output.csv
Extract a BigQuery schema from an existing table and pretty print it:
$ bq show --format=json ${DATASET}.${TABLE} \
| pyin -m json -m pprint "pprint.pformat(json.loads(line)['schema']['fields'])"
[{u'mode': u'NULLABLE', u'name': u'mmsi', u'type': u'STRING'},
{u'mode': u'NULLABLE', u'name': u'longitude', u'type': u'FLOAT'},
{u'mode': u'NULLABLE', u'name': u'latitude', u'type': u'FLOAT'}
...]
Read the first 100K lines of a CSV and write only the lines where column ‘Msg type’ is equal to 5.
$ pyin -i ${INFILE} -o ${OUTFILE} \
--true \
--lines 100000 \
--reader csv.DictReader \
--import csv \
--import newlinejson \
--writer newlinejson.Writer
"line['Msg type'] == '5'"
Installing
Via pip:
$ pip install git+https://github.com/geowurster/pyin.git
From master branch:
$ git clone https://github.com/geowurster/pyin
$ cd pyin && pip install .
Gotchas
It’s easy to completely modify the line content:
$ pyin -i sample-data/csv-with-header.csv "'operation'"
operationoperationoperationoperationoperationoperation
Forgetting to use -t to only get lines that evaluate as True:
$ pyin -i LICENSE.txt "'are' in line"
FalseFalseFalseFalseFalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
$ pyin -i LICENSE.txt "'are' in line" -t
modification, are permitted provided that the following conditions are met:
derived from this software without specific prior written permission.
The --reader-option key=val values are parsed to their Python type but if the user wants to specify something like which JSON library to use for a newlinejson.Reader() instance then they must do that via the --statement option:
$ pyin -i ${INFILE} -o ${OUTFILE}
--true
--import newlinejson \
--import ujson
--reader newlinejson.Reader \
--writer newlinejson.Writer \
--statement "newlinejson.JSON = ujson" \
"'type' in line and line['type'] is 5"
Developing
Install:
$ git clone https://github.com/geowurster/pyin
$ cd pyin
$ virtualenv venv
$ source venv/bin/activate
$ pip install -r requirements-dev.txt
$ pip install -e .
$ nosetests --with-coverage
$ pep8 --max-line-length=120 pyin.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file pyin-0.4.5.tar.gz
.
File metadata
- Download URL: pyin-0.4.5.tar.gz
- Upload date:
- Size: 10.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 41253293fefe14dcf25ede21d031a7ab35209d506784dc24cd71821f4e1582cc |
|
MD5 | 942c51c2476b28003e825b6e1f717eb8 |
|
BLAKE2b-256 | cc4cc297419c8cd4866f7a3770a1cc5a480fa4101c2f1b4d7240320ebc789cbb |