Skip to main content

Ever wished to use the Python syntax to work in Bash? Then pythonize it, piping the contents through your tiny Python script with `pz` utility!

Project description

pz

Build Status

Ever wished to use Python in Bash? Would you choose the Python syntax over sed, awk, ...? Should you exactly know what command would you use in Python but you end up querying man again and again, read further. Then pythonize it! Pipe the contents through pz, loaded with your tiny Python script.

How? Simply meddle with the s variable. Example: appending '.com' to every line.

$ echo -e "example\nwikipedia" | pz 's += ".com"'
example.com
wikipedia.com

Installation

Install with a single command from PyPi.

pip3 install pz    

Or download and launch the pz file from here.

Examples

How does your data look when pz? Which Bash programs may the utility substitute?

Extract a substring

Just use the [:] notation.

bash
echo "hello world" | pz s[6:]  # hello

Prepend to every line in a stream

We prepend the length of the line.

tail -f /var/log/syslog | pz 'f"{len(s)}: {s}"'

Converting to uppercase

Replacing | tr '[:upper:]' '[:lower:]'.

echo "HELLO" | pz s.lower  # "hello"

Parsing numbers

Replacing cut. Note you can chain multiple pz calls. Split by comma ',', then use n to access the line converted to a number.

echo "hello,5" | pz 's.split(",")[1]' | pz n+7  # 12

Find out all URLs in a text

Replacing sed. We know that all functions from the re library are already included, ex: "findall".

# either use the `--findall` flag
pz --findall "(https?://[^\s]+)" < file.log

# or expand the full command to which is the `--findall` flag equivalent
pz "findall(r'(https?://[^\s]+)', s)" < file.log

If chained, you can open all the URLs in the current web browser. Note that the function webbrowser.open gets auto-imported from the standard library.

pz --findall "(https?://[^\s]+)" < file.log | pz webbrowser.open

Sum numbers

Replacing | awk '{count+=$1} END{print count}' or | paste -sd+ | bc. Just use sum in the --finally clause.

echo -e "1\n2\n3\n4" | pz --finally sum  # 10

Keep unique lines

Replacing | sort | uniq makes little sense but the demonstration gives you the idea. We initialize a set c (like a collection). When processing a line, skip is set to True if already seen.

$ echo -e "1\n2\n2\n3" | pz "skip = s in c; c.add(s)"  --setup "c=set()"
1
2
3

However, an advantage over | sort | uniq comes when handling a stream. You see unique lines instantly, without waiting a stream to finish. Useful when using with tail --follow.

Alternatively, to assure the values are sorted, we can make a use of --finally flag that produces the output after the processing finished.

echo -e "1\n2\n2\n3" | pz "S.add(s)" --finally "sorted(S)"  -0

Note that we used the variable S which is initialized by default to an empty set (hence we do not have to use --setup at all) and the flag -0 to prevent the processing from output (we do not have to use skip parameter then).

(Strictly speaking we could omit -0 too. If you use the most verbose -vvv flag, you would see the command changed to s = S.add(s) internally. And since set.add produces None output, it is the same as if it was skipped.)

We can omit (s) in the command clause and hence get rid of the quotes all together.

echo -e "1\n2\n2\n3" | pz S.add --finally "sorted(S)"

Nevertheless, the most straightforward approach would involve the lines variable, available when using the --finally clause.

echo -e "1\n2\n2\n3" | pz --finally "sorted(set(lines))"

Counting words

We split the line to get the words and put them in S, a global instance of the set. Then, we print the set length to get the number of unique words.

echo -e "red green\nblue red green" | pz 'S.update(s.split())' --finally 'len(S)'  # 3

But what if we want to get the most common words and the count of its usages? Lets use C, a global instance of the collections.Counter. We see then the red is the most_common word and has been used 2 times.

echo -e "red green\nblue red green" | pz 'C.update(s.split())' --finally C.most_common
red, 2
green, 2
blue, 1

Fetching web content

Accessing internet is easy thanks to the requests library. Here, we fetch example.com, grep it for all lines containing "href" and print them out while stripping spaces.

$ echo "http://example.com" | pz 'requests.get(s).content' | grep href | pz s.strip 
<p><a href="https://www.iana.org/domains/example">More information...</a></p>

Too see how auto-import are resolved, use verbose mode. (Notice the line Importing requests.)

$ echo "http://example.com" | pz 'requests.get(s).content' -vvv | grep href | pz s.strip 
Changing the command clause to: s = requests.get(s).content
Importing requests
<p><a href="https://www.iana.org/domains/example">More information...</a></p>

Handling nested quotes

To match every line that has a quoted expressions and print out the quoted contents, you may serve yourself of Python triple quotes. In the example below, an apostrophe is used to delimite the COMMAND flag. If we used an apostrophe in the text, we had have to slash it. Instead, triple quotes might improve readability.

echo -e 'hello "world".' | pz 'match(r"""[^"]*"(.*)".""", s)' # world

In that case, even better is to use the --match flag to get rid of the quoting as much as possible.

echo -e 'hello "world".' | pz --match '[^"]*"(.*)"'  # world

Docs

Scope variables

In the script scope, you have access to the following variables:

s – current line

Change it according to your needs

echo 5 | pz 's += "4"'  # 54 

n – current line converted to an int (or float) if possible

echo 5 | pz n+2  # 7
echo 5.2 | pz n+2  # 7.2

text – whole text, all lines together

Available only with the --whole flag set.
Ex: get character count (an alternative to | wc -c).

echo -e "hello\nworld" | pz --finally 'len(text)' --whole  # 12

lines – list of lines so far processed

Available only with the --lines flag set.
Ex: returning the last line

# the `--lines` flag is automatically on when `--finally` used
echo -e "hello\nworld" | pz --finally lines[-1]  # "world"

numbers – list of numbers so far processed

Available only with the --lines flag set.
Ex: show current average of the stream. More specifically, we print out tuples: line count, current line, average.

$ echo -e "20\n40\n25\n28" | pz 'i+=1; s = i, s, sum(numbers)/i' --lines
1, 20, 20.0
2, 40, 30.0
3, 25, 28.333333333333332
4, 28, 28.25

skip line

If set to True, current line will not be output. If set to False when using the -0 flag, the line will be output regardless.

i, S, L, D, C – other global variables

Some variables are initialized and ready to be used globally. They are common for all the lines.

  • i = 0
  • S = set()
  • L = list()
  • D = dict()
  • C = Counter()

It is true that using uppercase is not conforming the naming convention. However in these tiny scripts the readability is the chief principle, every character counts.

Using a set S. In the example, we add every line to the set and finally print it out in a sorted manner.

echo -e "2\n1\n2\n3\n1" | pz "S.add(s)" --finally "sorted(S)"
1
2
3  

Using a list L. Append lines that contains a number bigger than one and finally, print their count. As only the final count matters, suppress the line output with the flag -0.

echo -e "2\n1\n2\n3\n1" | pz "if n > 1: L.append(s)" --finally "len(L)" -0
3  

Auto-import

  • You can always import libraries you need manually. (Put import statement into the command.)
  • Some libraries are ready to be used: re.* (match, search, findall), math.* (sqrt,...), datetime.* (datetime.now, ...), defaultdict
  • Some others are auto-imported whenever its use has been detected. In such case, the line is reprocessed.
    • Functions: b64decode, b64encode, (requests).get, Path, randint, sleep
    • Modules: base64, collections, humanize, jsonpickle, pathlib, random, requests, time, webbrowser

Caveat: When accessed first time, the auto-import makes the row reprocessed. It may influence your global variables. Use verbose output to see if something has been auto-imported.

echo -e "hey\nbuddy" | pz 'a+=1; sleep(1); b+=1; s = a,b ' --setup "a=0;b=0;" -vv
Importing sleep from time
2, 1
3, 2

As seen, a was incremented 3× times and b on twice because we had to process the first line twice in order to auto-import sleep. In the first run, the processing raised an exception because sleep was not known. To prevent that, explicitly appending from time import sleep to the --setup flag would do.

Output

  • Explicit assignment: By default, we output the s.

    echo "5" | pz 's = len(s)' # 1
    
  • Single expression: If not set explicitly, we assign the expression to s automatically.

    echo "5" | pz 'len(s)'  # 1 (command internally changed to `s = len(s)`)
    
  • Tuple, generator: If s ends up as a tuple, its get joined by spaces.

    $ echo "5" | pz 's, len(s)'
    5, 1 
    

    Consider piping two lines 'hey' and 'buddy'. We return three elements, original text, reversed text and its length.

    $ echo -e "hey\nbuddy" | pz 's,s[::-1],len(s)' 
    hey, yeh, 3
    buddy, yddub, 5
    
  • List: When s ends up as a list, its elements are printed to independent lines.

    $ echo "5" | pz '[s, len(s)]'
    5
    1 
    
  • Regular match: All groups are treated as a tuple. If no group used, we print the entire matched string.

    # no group → print entire matched string
    echo "hello world" | pz 'search(r"\s.*", s)' # " world"
    
    # single matched group
    echo "hello world" | pyed 'search(r"\s(.*)", s)' # "world"
    
    # matched groups treated as tuple
    echo "hello world" | pyed 'search(r"(.*)\s(.*)", s)'  # "hello, world"
    
  • Callable: It gets called. Very useful when handling simple function – without the need of explicitly putting parenthesis to call the function, we can omit quoting in Bash (expression s.lower() would have had to be quoted.) Use 3 verbose flags -vvv to inspect the internal change of the command.

    # internally changed to `s = s.lower()`
    echo "HEllO" | pyed s.lower  # "hello"
      
    # internally changed to `s = len(s)`
    echo "HEllO" | pyed len  # "5"
    
    # internally changed to `s = base64.b64encode(s.encode('utf-8'))`
    echo "HEllO" | pyed b64encode  # "SEVsbE8="
    
    # internally changed to `s = math.sqrt(n)`
    # and then to `s = round(n)`
    echo "25" | pyed sqrt | pyed round  # "5"
    
    # internally changed to `s = sum(numbers)`
    # `numbers` are available only when `--lines` or `--finally` set
    echo -e "1\n2\n3\n4" | pyed sum --lines
    1
    3
    6
    10
    
    # internally changed to `' - '.join(lines)`
    # `lines` are available only when `--lines` or `--finally` set  
    echo -e "1\n2\n3\n4" | pyed  --finally "' - '.join"
    1 - 2 - 3 - 4
    

    As you see in the examples, if TypeError raised, we try to reprocess the row while adding current line as the argument:

    • either its basic form s
    • the numbers if available
    • using its numeral representation n if available
    • encoded to bytes s.encode('utf-8')

    In the --finally clause, we try furthermore the lines.

CLI flags

  • command: Any Python script (multiple statements allowed)

  • --setup: Any Python script, executed before processing. Useful for variable initializing. Ex: prepend line numbers by incrementing a variable count.

    $ echo -e "row\nanother row" | pyed 'count+=1;s = f"{count}: {s}"'  --setup 'count=0'
    1: row
    2: another row
    

    Yes, we could use globally initialized variable i instead of using --setup.

  • --finally: Any Python script, executed after processing. Useful for final output. Turns on the --lines automatically because we do not expect an infinite stream.

    $ echo -e "1\n2\n3\n4" | pyed --finally sum
    10
    $ echo -e "1\n2\n3\n4" | pyed s --finally sum
    1
    2
    3
    4
    10  
    $ echo -e "1\n2\n3\n4" | pyed sum --finally sum
    1
    3
    6
    10
    10
    
  • --verbose: If you end up with no output, turn on to see what happened. Used once: show command exceptions. Twice: show automatic imports. Thrice: see internal command modification (attempts to make it callable and prepending s = if omitted).

    $ echo -e "hello" | pyed 'invalid command' # empty result
    $ echo -e "hello" | pyed 'invalid command' -v
    Exception: <class 'SyntaxError'> invalid syntax (<string>, line 1) on line: hello
    $ echo -e "hello" | pyed 'sleep(1)' -vv
    Importing sleep from time
    
  • --filter: Line is piped out unchanged, however only if evaluated to True. When piping in numbers to 5, we pass only such bigger than 3.

    $ echo -e "1\n2\n3\n4\n5" | pyed "n > 3"  --filter
    4
    5
    

    The statement is equivalent to using skip (and not using --filter).

    $ echo -e "1\n2\n3\n4\n5" | pyed "skip = not n > 3"
    4
    5
    

    When not using filter, s evaluates to True / False. By default, False or empty values are not output.

    $ echo -e "1\n2\n3\n4\n5" | pyed "n > 3"   
    True
    True
    
  • n: Process only such number of lines. Roughly equivalent to head -n.

  • -1: Process just the first line. Useful in combination with --whole.

  • --whole: Fetch the whole text first before processing. Variable text is available containing whole text. You might want to add -1 flag.

    $ echo -e "1\n2\n3" | pyed 'len(text)' 
    Did not you forget to use --whole?
    

    Appending --whole helps but the result is processed for every line again.

    $ echo -e "1\n2\n3" | pyed 'len(text)' -w 
    6
    6
    6
    

    Appending -1 makes sure the statement gets computed only once.

    $ echo -e "1\n2\n3" | pyed 'len(text)' -w1
    6    
    
  • --lines: Populate lines and numbers with lines. This is off by default since this would cause an overflow when handling an infinite input.

    $ echo -e "1\n2\n3\n4" | pyed sum  --lines  # (internally changed to `s = sum(numbers)`
    1
    3
    6
    10      
    
  • --empty Output even empty lines. (By default skipped.)
    Consider shortening the text by 3 last letters. First line hey disappears completely then.

    $ echo -e "hey\nbuddy" | pyed 's[:-3]'
    bu
    

    Should we insist on displaying, we see an empty line now.

    $ echo -e "hey\nbuddy" | pyed 's[:-3]' --empty
    
    bu
    
  • -0: Skip all lines output. (Useful in combination with --finally.)

Regular flags

  • --search: Equivalent to search(COMMAND, s)

    $ echo -e "hello world\nanother words" | pyed --search ".*\s"
    hello
    another
    
  • --match: Equivalent to match(COMMAND, s)

  • --findall: Equivalent to findall(COMMAND, s)

  • --sub SUBSTITUTION: Equivalent to sub(COMMAND, SUBSTITUTION, s)

    $ echo -e "hello world\nanother words" | pyed --sub ":" ".*\s"
    :world
    :words
    

    Using groups

    $ echo -e "hello world\nanother words" | pyed --sub "\1" "(.*)\s"
    helloworld
    anotherwords
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pz-0.9.tar.gz (19.1 kB view details)

Uploaded Source

File details

Details for the file pz-0.9.tar.gz.

File metadata

  • Download URL: pz-0.9.tar.gz
  • Upload date:
  • Size: 19.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.23.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.53.0 CPython/3.8.6

File hashes

Hashes for pz-0.9.tar.gz
Algorithm Hash digest
SHA256 bd4db36bcd15581b1eb02ed9739396581b601e9d565161a81a3ecd9dff5c5f32
MD5 be63d541a0198a1534384a80eb701c43
BLAKE2b-256 fedcdce5fa5c272ece0f65bab469baea3ba416d66f26343fa65d76b94481e6df

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page