Ever wished to use the Python syntax to work in Bash? Then pythonize it, piping the contents through your tiny Python script with `pz` utility!
Project description
pz
Ever wished to use Python in Bash? Would you choose the Python syntax over sed
, awk
, ...? Should you exactly know what command would you use in Python but you end up querying man
again and again, read further. Then p
ythoniz
e it! Pipe the contents through pz
, loaded with your tiny Python script.
How? Simply meddle with the s
variable. Example: appending '.com' to every line.
$ echo -e "example\nwikipedia" | pz 's += ".com"'
example.com
wikipedia.com
Installation
Install with a single command from PyPi.
pip3 install pz
Or download and launch the pz
file from here.
Examples
How does your data look when pz
? Which Bash programs may the utility substitute?
Extract a substring
Just use the [:]
notation.
bash
echo "hello world" | pz s[6:] # hello
Prepend to every line in a stream
We prepend the length of the line.
tail -f /var/log/syslog | pz 'f"{len(s)}: {s}"'
Converting to uppercase
Replacing | tr '[:upper:]' '[:lower:]'
.
echo "HELLO" | pz s.lower # "hello"
Parsing numbers
Replacing cut
. Note you can chain multiple pz
calls. Split by comma ',
', then use n
to access the line converted to a number.
echo "hello,5" | pz 's.split(",")[1]' | pz n+7 # 12
Find out all URLs in a text
Replacing sed
. We know that all functions from the re
library are already included, ex: "findall".
# either use the `--findall` flag
pz --findall "(https?://[^\s]+)" < file.log
# or expand the full command to which is the `--findall` flag equivalent
pz "findall(r'(https?://[^\s]+)', s)" < file.log
If chained, you can open all the URLs in the current web browser. Note that the function webbrowser.open
gets auto-imported from the standard library.
pz --findall "(https?://[^\s]+)" < file.log | pz webbrowser.open
Sum numbers
Replacing | awk '{count+=$1} END{print count}'
or | paste -sd+ | bc
. Just use sum
in the --finally
clause.
echo -e "1\n2\n3\n4" | pz --finally sum # 10
Keep unique lines
Replacing | sort | uniq
makes little sense but the demonstration gives you the idea. We initialize a set c
(like a collection). When processing a line, skip
is set to True
if already seen.
$ echo -e "1\n2\n2\n3" | pz "skip = s in c; c.add(s)" --setup "c=set()"
1
2
3
However, an advantage over | sort | uniq
comes when handling a stream. You see unique lines instantly, without waiting a stream to finish. Useful when using with tail --follow
.
Alternatively, to assure the values are sorted, we can make a use of --finally
flag that produces the output after the processing finished.
echo -e "1\n2\n2\n3" | pz "S.add(s)" --finally "sorted(S)" -0
Note that we used the variable S
which is initialized by default to an empty set (hence we do not have to use --setup
at all) and the flag -0
to prevent the processing from output (we do not have to use skip
parameter then).
(Strictly speaking we could omit -0
too. If you use the most verbose -vvv
flag, you would see the command changed to s = S.add(s)
internally. And since set.add
produces None
output, it is the same as if it was skipped.)
We can omit (s)
in the command
clause and hence get rid of the quotes all together.
echo -e "1\n2\n2\n3" | pz S.add --finally "sorted(S)"
Nevertheless, the most straightforward approach would involve the lines
variable, available when using the --finally
clause.
echo -e "1\n2\n2\n3" | pz --finally "sorted(set(lines))"
Counting words
We split the line to get the words and put them in S
, a global instance of the set
. Then, we print the set length to get the number of unique words.
echo -e "red green\nblue red green" | pz 'S.update(s.split())' --finally 'len(S)' # 3
But what if we want to get the most common words and the count of its usages? Lets use C
, a global instance of the collections.Counter
. We see then the red
is the most_common word and has been used 2 times.
echo -e "red green\nblue red green" | pz 'C.update(s.split())' --finally C.most_common
red, 2
green, 2
blue, 1
Fetching web content
Accessing internet is easy thanks to the requests
library. Here, we fetch example.com
, grep it for all lines containing "href" and print them out while stripping spaces.
$ echo "http://example.com" | pz 'requests.get(s).content' | grep href | pz s.strip
<p><a href="https://www.iana.org/domains/example">More information...</a></p>
Too see how auto-import are resolved, use verbose mode. (Notice the line Importing requests
.)
$ echo "http://example.com" | pz 'requests.get(s).content' -vvv | grep href | pz s.strip
Changing the command clause to: s = requests.get(s).content
Importing requests
<p><a href="https://www.iana.org/domains/example">More information...</a></p>
Handling nested quotes
To match every line that has a quoted expressions and print out the quoted contents, you may serve yourself of Python triple quotes. In the example below, an apostrophe is used to delimite the COMMAND
flag. If we used an apostrophe in the text, we had have to slash it. Instead, triple quotes might improve readability.
echo -e 'hello "world".' | pz 'match(r"""[^"]*"(.*)".""", s)' # world
In that case, even better is to use the --match
flag to get rid of the quoting as much as possible.
echo -e 'hello "world".' | pz --match '[^"]*"(.*)"' # world
Docs
Scope variables
In the script scope, you have access to the following variables:
s
– current line
Change it according to your needs
echo 5 | pz 's += "4"' # 54
n
– current line converted to an int
(or float
) if possible
echo 5 | pz n+2 # 7
echo 5.2 | pz n+2 # 7.2
text
– whole text, all lines together
Available only with the --whole
flag set.
Ex: get character count (an alternative to | wc -c
).
echo -e "hello\nworld" | pz --finally 'len(text)' --whole # 12
lines
– list of lines so far processed
Available only with the --lines
flag set.
Ex: returning the last line
# the `--lines` flag is automatically on when `--finally` used
echo -e "hello\nworld" | pz --finally lines[-1] # "world"
numbers
– list of numbers so far processed
Available only with the --lines
flag set.
Ex: show current average of the stream. More specifically, we print out tuples: line count, current line, average
.
$ echo -e "20\n40\n25\n28" | pz 'i+=1; s = i, s, sum(numbers)/i' --lines
1, 20, 20.0
2, 40, 30.0
3, 25, 28.333333333333332
4, 28, 28.25
skip
line
If set to True
, current line will not be output. If set to False
when using the -0
flag, the line will be output regardless.
i
, S
, L
, D
, C
– other global variables
Some variables are initialized and ready to be used globally. They are common for all the lines.
i = 0
S = set()
L = list()
D = dict()
C = Counter()
It is true that using uppercase is not conforming the naming convention. However in these tiny scripts the readability is the chief principle, every character counts.
Using a set S
. In the example, we add every line to the set and finally print it out in a sorted manner.
echo -e "2\n1\n2\n3\n1" | pz "S.add(s)" --finally "sorted(S)"
1
2
3
Using a list L
. Append lines that contains a number bigger than one and finally, print their count. As only the final count matters, suppress the line output with the flag -0
.
echo -e "2\n1\n2\n3\n1" | pz "if n > 1: L.append(s)" --finally "len(L)" -0
3
Auto-import
- You can always import libraries you need manually. (Put
import
statement into the command.) - Some libraries are ready to be used:
re.* (match, search, findall), math.* (sqrt,...), datetime.* (datetime.now, ...), defaultdict
- Some others are auto-imported whenever its use has been detected. In such case, the line is reprocessed.
- Functions:
b64decode, b64encode, (requests).get, Path, randint, sleep
- Modules:
base64, collections, humanize, jsonpickle, pathlib, random, requests, time, webbrowser
- Functions:
Caveat: When accessed first time, the auto-import makes the row reprocessed. It may influence your global variables. Use verbose output to see if something has been auto-imported.
echo -e "hey\nbuddy" | pz 'a+=1; sleep(1); b+=1; s = a,b ' --setup "a=0;b=0;" -vv
Importing sleep from time
2, 1
3, 2
As seen, a
was incremented 3× times and b
on twice because we had to process the first line twice in order to auto-import sleep. In the first run, the processing raised an exception because sleep
was not known. To prevent that, explicitly appending from time import sleep
to the --setup
flag would do.
Output
-
Explicit assignment: By default, we output the
s
.echo "5" | pz 's = len(s)' # 1
-
Single expression: If not set explicitly, we assign the expression to
s
automatically.echo "5" | pz 'len(s)' # 1 (command internally changed to `s = len(s)`)
-
Tuple, generator: If
s
ends up as a tuple, its get joined by spaces.$ echo "5" | pz 's, len(s)' 5, 1
Consider piping two lines 'hey' and 'buddy'. We return three elements, original text, reversed text and its length.
$ echo -e "hey\nbuddy" | pz 's,s[::-1],len(s)' hey, yeh, 3 buddy, yddub, 5
-
List: When
s
ends up as a list, its elements are printed to independent lines.$ echo "5" | pz '[s, len(s)]' 5 1
-
Regular match: All groups are treated as a tuple. If no group used, we print the entire matched string.
# no group → print entire matched string echo "hello world" | pz 'search(r"\s.*", s)' # " world" # single matched group echo "hello world" | pyed 'search(r"\s(.*)", s)' # "world" # matched groups treated as tuple echo "hello world" | pyed 'search(r"(.*)\s(.*)", s)' # "hello, world"
-
Callable: It gets called. Very useful when handling simple function – without the need of explicitly putting parenthesis to call the function, we can omit quoting in Bash (expression
s.lower()
would have had to be quoted.) Use 3 verbose flags-vvv
to inspect the internal change of the command.# internally changed to `s = s.lower()` echo "HEllO" | pyed s.lower # "hello" # internally changed to `s = len(s)` echo "HEllO" | pyed len # "5" # internally changed to `s = base64.b64encode(s.encode('utf-8'))` echo "HEllO" | pyed b64encode # "SEVsbE8=" # internally changed to `s = math.sqrt(n)` # and then to `s = round(n)` echo "25" | pyed sqrt | pyed round # "5" # internally changed to `s = sum(numbers)` # `numbers` are available only when `--lines` or `--finally` set echo -e "1\n2\n3\n4" | pyed sum --lines 1 3 6 10 # internally changed to `' - '.join(lines)` # `lines` are available only when `--lines` or `--finally` set echo -e "1\n2\n3\n4" | pyed --finally "' - '.join" 1 - 2 - 3 - 4
As you see in the examples, if
TypeError
raised, we try to reprocess the row while adding current line as the argument:- either its basic form
s
- the
numbers
if available - using its numeral representation
n
if available - encoded to bytes
s.encode('utf-8')
In the
--finally
clause, we try furthermore thelines
. - either its basic form
CLI flags
-
command
: Any Python script (multiple statements allowed) -
--setup
: Any Python script, executed before processing. Useful for variable initializing. Ex: prepend line numbers by incrementing a variablecount
.$ echo -e "row\nanother row" | pyed 'count+=1;s = f"{count}: {s}"' --setup 'count=0' 1: row 2: another row
Yes, we could use globally initialized variable
i
instead of using--setup
. -
--finally
: Any Python script, executed after processing. Useful for final output. Turns on the--lines
automatically because we do not expect an infinite stream.$ echo -e "1\n2\n3\n4" | pyed --finally sum 10 $ echo -e "1\n2\n3\n4" | pyed s --finally sum 1 2 3 4 10 $ echo -e "1\n2\n3\n4" | pyed sum --finally sum 1 3 6 10 10
-
--verbose
: If you end up with no output, turn on to see what happened. Used once: show command exceptions. Twice: show automatic imports. Thrice: see internal command modification (attempts to make it callable and prependings =
if omitted).$ echo -e "hello" | pyed 'invalid command' # empty result $ echo -e "hello" | pyed 'invalid command' -v Exception: <class 'SyntaxError'> invalid syntax (<string>, line 1) on line: hello $ echo -e "hello" | pyed 'sleep(1)' -vv Importing sleep from time
-
--filter
: Line is piped out unchanged, however only if evaluated toTrue
. When piping in numbers to 5, we pass only such bigger than 3.$ echo -e "1\n2\n3\n4\n5" | pyed "n > 3" --filter 4 5
The statement is equivalent to using
skip
(and not using--filter
).$ echo -e "1\n2\n3\n4\n5" | pyed "skip = not n > 3" 4 5
When not using filter,
s
evaluates toTrue
/False
. By default,False
or empty values are not output.$ echo -e "1\n2\n3\n4\n5" | pyed "n > 3" True True
-
n
: Process only such number of lines. Roughly equivalent tohead -n
. -
-1
: Process just the first line. Useful in combination with--whole
. -
--whole
: Fetch the whole text first before processing. Variabletext
is available containing whole text. You might want to add-1
flag.$ echo -e "1\n2\n3" | pyed 'len(text)' Did not you forget to use --whole?
Appending
--whole
helps but the result is processed for every line again.$ echo -e "1\n2\n3" | pyed 'len(text)' -w 6 6 6
Appending
-1
makes sure the statement gets computed only once.$ echo -e "1\n2\n3" | pyed 'len(text)' -w1 6
-
--lines
: Populatelines
andnumbers
with lines. This is off by default since this would cause an overflow when handling an infinite input.$ echo -e "1\n2\n3\n4" | pyed sum --lines # (internally changed to `s = sum(numbers)` 1 3 6 10
-
--empty
Output even empty lines. (By default skipped.)
Consider shortening the text by 3 last letters. First linehey
disappears completely then.$ echo -e "hey\nbuddy" | pyed 's[:-3]' bu
Should we insist on displaying, we see an empty line now.
$ echo -e "hey\nbuddy" | pyed 's[:-3]' --empty bu
-
-0
: Skip all lines output. (Useful in combination with--finally
.)
Regular flags
-
--search
: Equivalent tosearch(COMMAND, s)
$ echo -e "hello world\nanother words" | pyed --search ".*\s" hello another
-
--match
: Equivalent tomatch(COMMAND, s)
-
--findall
: Equivalent tofindall(COMMAND, s)
-
--sub SUBSTITUTION
: Equivalent tosub(COMMAND, SUBSTITUTION, s)
$ echo -e "hello world\nanother words" | pyed --sub ":" ".*\s" :world :words
Using groups
$ echo -e "hello world\nanother words" | pyed --sub "\1" "(.*)\s" helloworld anotherwords
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file pz-0.9.tar.gz
.
File metadata
- Download URL: pz-0.9.tar.gz
- Upload date:
- Size: 19.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.23.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.53.0 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bd4db36bcd15581b1eb02ed9739396581b601e9d565161a81a3ecd9dff5c5f32 |
|
MD5 | be63d541a0198a1534384a80eb701c43 |
|
BLAKE2b-256 | fedcdce5fa5c272ece0f65bab469baea3ba416d66f26343fa65d76b94481e6df |