Skip to main content

basic streaming text processing

Project description

It’s like sed, but Python!

https://travis-ci.org/geowurster/pyin.svg?branch=master https://coveralls.io/repos/geowurster/pyin/badge.svg?branch=master

Why?

There are plenty of Unix tools, like sed and awk for processing text data from stdin or a file on disk, but the syntax can be unfriendly and sometimes its just easier to write a really simple script with a for loop and some if statements. This project seeks to drop you in the middle of that for loop and let you write your own Python expressions to quickly get the job done without actually writing a script, handle I/O, etc.

Command Line Interface

This project is intended to be used from the included utility pyin, although the underlying pyin.core.pmap() function could be used elsewhere with non-string objects.

Usage: pyin [OPTIONS] EXPRESSIONS...

  It's like sed, but Python!

  Map Python expressions across lines of text.  If an expression evaluates as
  'False' or 'None' then the current line is thrown away.  If an expression
  evaluates as 'True' then the next expression is evaluated.  If a list or
  dictionary is encountered it is JSON encoded.  All other objects are cast
  to string.

  Newline characters are stripped from the end of each line before processing
  and are added on write unless disabled with '--no-newline'.

  This utility employs 'eval()' internally but uses a limited scope to help
  prevent accidental side effects, but there are plenty of ways to get around
  this so don't pass anything through pyin that you wouldn't pass through
  'eval()'.

  Remove lines that do not contain a specific word:

      $ cat INFILE | pyin "'word' in line"

  Capitalize lines containing a specific word:

      $ cat INFILE | pyin "line.upper() if 'word' in line else line"

  Only print every other word from lines that contain a specific word:

      $ cat INFILE | pyin \
      > "'word' in line" \      # Get lines with 'word' in them
      > "line.split()[::2])" \  # Grab every other word
      > "' '.join(line)"         # Convert list from previous expr to str

  Process all input text as though it was a single line to replace carriage
  returns with the system newline character:

      $ cat INFILE | pyin --block \
      > "line.replace('\r\n', os.newline)"

  For a more in-depth explanation about exactly what's going on under the
  hood, see the the docstring in 'pyin.core.pmap()':

      $ python -c "help('pin.core.pmap')"

Options:
  --version           Show the version and exit.
  -i, --infile PATH   Input text file. [default: stdin]
  -o, --outfile PATH  Output text file. [default: stdout]
  --block             Operate on all input text as though it was a single
                      line.
  --no-newline        Don't ensure each line ends with a newline character.
  --help              Show this message and exit.

Installing

Via pip:

$ pip install pyin

From master branch:

$ git clone https://github.com/geowurster/pyin
$ cd pyin && python setup.py install

What about py -x?

Most of this project was written with very little knowledge of py and no knowledge of py -x, which serves almost exactly the same purpose. The primary difference between the two projects is that pyin requires I/O and has some smarter filtering for expressions that evaluate as True or False.

Developing

Install:

$ git clone https://github.com/geowurster/pyin
$ cd pyin
$ virtualenv venv && source venv/bin/activate
$ pip install -e .\[dev\]
$ py.test tests --cov pyin --cov-report term-missing

License

See LICENSE.txt

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyin-0.5.3.tar.gz (8.0 kB view details)

Uploaded Source

File details

Details for the file pyin-0.5.3.tar.gz.

File metadata

  • Download URL: pyin-0.5.3.tar.gz
  • Upload date:
  • Size: 8.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for pyin-0.5.3.tar.gz
Algorithm Hash digest
SHA256 e469eca614aae2afcc9c864269d3aec014efd0945b2bc611a8ef90ddf1f5cf3b
MD5 2998221003242905f9cbc5f1deb31399
BLAKE2b-256 5f2366bfa5abd13c0604023b6b96aa97044940485fe43f82c4588de186463f3b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page