Skip to main content

awkg is an awk-like text-processing tool powered by python language

Project description

awkg

awkg is an awk like utility using modern day python language. awk is amazingly simple, fast and quite handy. However, its domain specific constrain sometimes get in our way. awkg follows the steps of awk's design (including its convention for name😉) and exposes full power of the modern day python. Python's large set of off-the-shelf existing libraries can of course be imported and used.

Installation

# Install from pypy 
$ pip install awkg

# Install from github
$ pip install git+https://github.com/thammegowda/awkg.git

CLI usage:

$ awkg -h 
usage: awkg [-h] [-i INP] [-o OUT] [-F FS] [-OFS OFS] [-ORS ORS]
            [-b BEGIN_SCRIPT] [-e END_SCRIPT] [-im IMPORTS] [-it INIT_PATH]
            [-v]
            inline_script

awkg is an awk-like text-processing tool powered by python language

positional arguments:
  inline_script         Inline python script

optional arguments:
  -h, --help            show this help message and exit
  -i INP, --inp INP     Input file path; None=STDIN
  -o OUT, --out OUT     Output file path; None=STDOUT
  -F FS, -FS FS, --field-sep FS
                        the input field separator. Default=None implies white
                        space
  -OFS OFS, --out-field-sep OFS
                        the out field separator. Default=None implies same as
                        input FS.
  -ORS ORS, --out-rec-sep ORS
                        the output record separator. Default=None implies same
                        as input RS.
  -b BEGIN_SCRIPT, --begin BEGIN_SCRIPT
                        BEGIN block. initialize variables or whatever
  -e END_SCRIPT, --end END_SCRIPT
                        END block. Print summaries or whatever
  -im IMPORTS, --import IMPORTS
                        Imports block. Specify a list of module names to be
                        imported.Semicolon (;) is the delimiter. Ex:
                        json;numpy as np
  -it INIT_PATH, --init INIT_PATH
                        The rc file that initializes environment.Default is
                        $HOME/.awkg.py
  -v, --version         show program's version number and exit

Example

Compute mean and std of words per sequence

cat data/train.src | awkg -b 'arr=[]; import numpy as np' 'arr.append(NF)' \
   -e 'arr=np.array(arr); print(f"{NR} lines from {FNAME}, mean={arr.mean():.2f}; std={arr.std():.4f}")'

Filter records

# use print() explicitely 
cat data/train.src  | awkg  'if NF >= 25: print(*R)' 

Assign boolean expression to special variable RET to trigger implicit print 
cat data/train.src  | awkg  'RET = NF >= 25'

# print respects the OFS value
cat data/train.src  |  awkg  'if NF >= 25: print(NR, NF)' -OFS='\t'

Special Variables

  • NF : Number of fields
  • NR : Record number
  • R : An array having all the columns of current record.
  • R0 : analogous to $0 it stores the input line before splitting into R; since python does not permit $ in the identifiers, it is renamed as R0
  • RET : When this variable is set to Truth value of true implicit print(*R) is triggered
  • FS : Input Field separator
  • OFS : Output Field separator; Unless explicitly set, OFS=FS
  • ORS : Output Record separator
  • RS (Currently Not in use)
  • _locals , _globals - all variables in local and global scope

You are allowed to use any valid python identifiers, than the above variables

Default import modules

These modules are imported by default

  • sys
  • os
  • re
  • from pathlib import Path

Author:

Related tools

  • pawk similar to this repository, slightly different implementation.
  • gawk GNU awk

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

awkg-0.2.0.tar.gz (5.3 kB view details)

Uploaded Source

Built Distribution

awkg-0.2.0-py3-none-any.whl (18.3 kB view details)

Uploaded Python 3

File details

Details for the file awkg-0.2.0.tar.gz.

File metadata

  • Download URL: awkg-0.2.0.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.23.4 CPython/3.6.7

File hashes

Hashes for awkg-0.2.0.tar.gz
Algorithm Hash digest
SHA256 83588caa563df6fbba2f9f81eec21884722f0871628d96e34d9ad8870f1ea30f
MD5 0e38611e0097ee9fc11825eded9f47fb
BLAKE2b-256 feec6f99cf69c34a183a28b143b2433e2c33b2e59b5a648cbdfdba13b3c51ec4

See more details on using hashes here.

File details

Details for the file awkg-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: awkg-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 18.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.23.4 CPython/3.6.7

File hashes

Hashes for awkg-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dfe2e61fd3e4c4faec4ccbd58f0ed2335d79af7e31c5b78a786c90111c6d8c19
MD5 6ec3ae9da42cfa14b5c679409986b57f
BLAKE2b-256 babb6903904abc9d12d6f9e1e18f3eb75a4fc0885f3d742faf0e887920e8ce2c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page