Integrate DataFlows with shell scripts
Project description
# DataFlows Shell
DataFlows is a *"novel and intuitive way of building data processing flows."*
DataFlows Shell leverage DataFlows to use the same intuitive data processing flows for shell automation.
## Introduction
A lot of the work on the shell, especially for "DevOps" / automation type work, deals with data processing.
The first command a shell user learns is `ls` - which returns a set of data.
The second might by `grep` or `cp` - which filters and performs actions based on this data set.
DataFlows Shell acts as a very minimal and intuitive layer between the shell and [the DataFlows framework](https://github.com/datahq/dataflows).
## Demo
The following example demonstrates importing some processors to the local shell, using them to run a processor chain and printing the output. It uses the `kubectl` processor to get a list of pods from a Kubernetes cluster and filter based on a condition defined using a Python lambda function.
```
$ source <(dfs import printer filter_rows kubectl)
$ kubectl get pods -c -q \
| dfs 'lambda row: row.update(is_ckan="ckan" in str(row["volumes"]))' --fields=+is_ckan:boolean -q
| filter_rows --args='[[{"is_ckan":true}]]' -q
{'count_of_rows': 12, 'bytes': 57584, 'hash': '5febe0c3cfe75d174e242f290f00c289', 'dataset_name': None}
checkpoint:1
{'count_of_rows': 12, 'bytes': 57876, 'hash': '17f446a8f562f10cccc1de1a33c48d91', 'dataset_name': None}
checkpoint:2
{'count_of_rows': 6, 'bytes': 40797, 'hash': '6ab4290efd82478b1677d1f226c4199a', 'dataset_name': None}
checkpoint:3
$ printer --kwargs='{"fields":["kind","name","namespace"]}'
saving checkpoint to: .dfs-checkpoints/__9
using checkpoint data from .dfs-checkpoints/__8
res_1:
# kind name namespace
(string) (string) (string)
--- ---------- ---------------------------- -----------
1 Pod ckan-5d74747649-92z9x odata-blue
2 Pod ckan-5d74747649-fzvd6 odata-blue
3 Pod ckan-jobs-5d895695cf-wgrzr odata-blue
4 Pod datastore-db-944bfbc74-2nc7b odata-blue
5 Pod db-7dd99b8547-vpf57 odata-blue
6 Pod pipelines-9f4466db-vlzm8 odata-blue
checkpoint saved: __9
{'count_of_rows': 6, 'bytes': 40798, 'hash': 'adc31744dfc99a0d8cbe7b081f31d78b', 'dataset_name': None}
checkpoint:9
```
## Install
The only required core dependencies are Bash and Python3.7+
To get a compatible Python you can use [Miniconda](https://conda.io/miniconda.html):
```
$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ bash Miniconda3-latest-Linux-x86_64.sh
$ wget https://raw.githubusercontent.com/OriHoch/dataflows-shell/master/environment.yaml
$ conda env create -f environment.yaml
$ conda activate dataflows-shell
```
Install the dataflows-shell package
```
python3 -m pip install -U dataflows-shell
```
Start an interactive DataFlows shell session
```
$ dfs
DataFlows Shell
press <CTRL + C> to exit the shell
press <Enter> to switch between DataFlows shell and system shell
type '--help' for the DataFlows Shell reference
dfs >
```
## Documentation
* [DataFlows Shell Tutorial](TUTORIAL.md)
* [DataFlows Shell Reference](REFERENCE.md)
* [DataFlows Shell Processors Reference](dataflows_shell/processors/README.md)
* [DataFlows Processors Reference](https://github.com/datahq/dataflows/blob/master/PROCESSORS.md)
DataFlows is a *"novel and intuitive way of building data processing flows."*
DataFlows Shell leverage DataFlows to use the same intuitive data processing flows for shell automation.
## Introduction
A lot of the work on the shell, especially for "DevOps" / automation type work, deals with data processing.
The first command a shell user learns is `ls` - which returns a set of data.
The second might by `grep` or `cp` - which filters and performs actions based on this data set.
DataFlows Shell acts as a very minimal and intuitive layer between the shell and [the DataFlows framework](https://github.com/datahq/dataflows).
## Demo
The following example demonstrates importing some processors to the local shell, using them to run a processor chain and printing the output. It uses the `kubectl` processor to get a list of pods from a Kubernetes cluster and filter based on a condition defined using a Python lambda function.
```
$ source <(dfs import printer filter_rows kubectl)
$ kubectl get pods -c -q \
| dfs 'lambda row: row.update(is_ckan="ckan" in str(row["volumes"]))' --fields=+is_ckan:boolean -q
| filter_rows --args='[[{"is_ckan":true}]]' -q
{'count_of_rows': 12, 'bytes': 57584, 'hash': '5febe0c3cfe75d174e242f290f00c289', 'dataset_name': None}
checkpoint:1
{'count_of_rows': 12, 'bytes': 57876, 'hash': '17f446a8f562f10cccc1de1a33c48d91', 'dataset_name': None}
checkpoint:2
{'count_of_rows': 6, 'bytes': 40797, 'hash': '6ab4290efd82478b1677d1f226c4199a', 'dataset_name': None}
checkpoint:3
$ printer --kwargs='{"fields":["kind","name","namespace"]}'
saving checkpoint to: .dfs-checkpoints/__9
using checkpoint data from .dfs-checkpoints/__8
res_1:
# kind name namespace
(string) (string) (string)
--- ---------- ---------------------------- -----------
1 Pod ckan-5d74747649-92z9x odata-blue
2 Pod ckan-5d74747649-fzvd6 odata-blue
3 Pod ckan-jobs-5d895695cf-wgrzr odata-blue
4 Pod datastore-db-944bfbc74-2nc7b odata-blue
5 Pod db-7dd99b8547-vpf57 odata-blue
6 Pod pipelines-9f4466db-vlzm8 odata-blue
checkpoint saved: __9
{'count_of_rows': 6, 'bytes': 40798, 'hash': 'adc31744dfc99a0d8cbe7b081f31d78b', 'dataset_name': None}
checkpoint:9
```
## Install
The only required core dependencies are Bash and Python3.7+
To get a compatible Python you can use [Miniconda](https://conda.io/miniconda.html):
```
$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ bash Miniconda3-latest-Linux-x86_64.sh
$ wget https://raw.githubusercontent.com/OriHoch/dataflows-shell/master/environment.yaml
$ conda env create -f environment.yaml
$ conda activate dataflows-shell
```
Install the dataflows-shell package
```
python3 -m pip install -U dataflows-shell
```
Start an interactive DataFlows shell session
```
$ dfs
DataFlows Shell
press <CTRL + C> to exit the shell
press <Enter> to switch between DataFlows shell and system shell
type '--help' for the DataFlows Shell reference
dfs >
```
## Documentation
* [DataFlows Shell Tutorial](TUTORIAL.md)
* [DataFlows Shell Reference](REFERENCE.md)
* [DataFlows Shell Processors Reference](dataflows_shell/processors/README.md)
* [DataFlows Processors Reference](https://github.com/datahq/dataflows/blob/master/PROCESSORS.md)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dataflows_shell-0.0.3.tar.gz
(9.5 kB
view details)
File details
Details for the file dataflows_shell-0.0.3.tar.gz
.
File metadata
- Download URL: dataflows_shell-0.0.3.tar.gz
- Upload date:
- Size: 9.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.20.1 setuptools/40.5.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.6.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4f02dbab73e6746ed230986bf698ffa4101f2a70f350fffe9f3d132e5f1809ae |
|
MD5 | 859144815cdbca1f974e1404c47dbaee |
|
BLAKE2b-256 | 17f75d8ee3b030959ea9a10cb4f77dc866fe795918574be31f03299e254545c0 |