Skip to main content

Integrate DataFlows with shell scripts

Project description

# DataFlows Shell

DataFlows is a *"novel and intuitive way of building data processing flows."*

DataFlows Shell leverage DataFlows to use the same intuitive data processing flows for shell automation.

## Introduction

A lot of the work on the shell, especially for "DevOps" / automation type work, deals with data processing.
The first command a shell user learns is `ls` - which returns a set of data.
The second might by `grep` or `cp` - which filters and performs actions based on this data set.

DataFlows Shell acts as a very minimal and intuitive layer between the shell and [the DataFlows framework](https://github.com/datahq/dataflows).

## Demo

The following example demonstrates importing some processors to the local shell, using them to run a processor chain and printing the output. It uses the `kubectl` processor to get a list of pods from a Kubernetes cluster and filter based on a condition defined using a Python lambda function.

```
$ source <(dfs import printer filter_rows kubectl)

$ kubectl get pods -c -q \
| dfs 'lambda row: row.update(is_ckan="ckan" in str(row["volumes"]))' --fields=+is_ckan:boolean -q
| filter_rows --args='[[{"is_ckan":true}]]' -q

{'count_of_rows': 12, 'bytes': 57584, 'hash': '5febe0c3cfe75d174e242f290f00c289', 'dataset_name': None}
checkpoint:1
{'count_of_rows': 12, 'bytes': 57876, 'hash': '17f446a8f562f10cccc1de1a33c48d91', 'dataset_name': None}
checkpoint:2
{'count_of_rows': 6, 'bytes': 40797, 'hash': '6ab4290efd82478b1677d1f226c4199a', 'dataset_name': None}
checkpoint:3

$ printer --kwargs='{"fields":["kind","name","namespace"]}'

saving checkpoint to: .dfs-checkpoints/__9
using checkpoint data from .dfs-checkpoints/__8
res_1:
# kind name namespace
(string) (string) (string)
--- ---------- ---------------------------- -----------
1 Pod ckan-5d74747649-92z9x odata-blue
2 Pod ckan-5d74747649-fzvd6 odata-blue
3 Pod ckan-jobs-5d895695cf-wgrzr odata-blue
4 Pod datastore-db-944bfbc74-2nc7b odata-blue
5 Pod db-7dd99b8547-vpf57 odata-blue
6 Pod pipelines-9f4466db-vlzm8 odata-blue
checkpoint saved: __9
{'count_of_rows': 6, 'bytes': 40798, 'hash': 'adc31744dfc99a0d8cbe7b081f31d78b', 'dataset_name': None}
checkpoint:9
```

## Install

The only required core dependencies are Bash and Python3.7+

To get a compatible Python you can use [Miniconda](https://conda.io/miniconda.html):

```
$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ bash Miniconda3-latest-Linux-x86_64.sh
$ wget https://raw.githubusercontent.com/OriHoch/dataflows-shell/master/environment.yaml
$ conda env create -f environment.yaml
$ conda activate dataflows-shell
```

Install the dataflows-shell package

```
python3 -m pip install -U dataflows-shell
```

Start an interactive DataFlows shell session

```
$ dfs

DataFlows Shell

press <CTRL + C> to exit the shell
press <Enter> to switch between DataFlows shell and system shell
type '--help' for the DataFlows Shell reference

dfs >
```

## Documentation

* [DataFlows Shell Tutorial](TUTORIAL.md)
* [DataFlows Shell Reference](REFERENCE.md)
* [DataFlows Shell Processors Reference](dataflows_shell/processors/README.md)
* [DataFlows Processors Reference](https://github.com/datahq/dataflows/blob/master/PROCESSORS.md)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataflows_shell-0.0.6.tar.gz (12.0 kB view details)

Uploaded Source

File details

Details for the file dataflows_shell-0.0.6.tar.gz.

File metadata

  • Download URL: dataflows_shell-0.0.6.tar.gz
  • Upload date:
  • Size: 12.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.20.1 setuptools/40.5.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.6.3

File hashes

Hashes for dataflows_shell-0.0.6.tar.gz
Algorithm Hash digest
SHA256 3bd44602570c69de46cd1d0fa9546bfffdac81838561f851b140e7d0fb224ad6
MD5 737410d9a0db184ab913ecfa8cb17b9e
BLAKE2b-256 9a7908e423e546df552f5375f1d21c0af4d9e4d790bf3ce54b6ba65221509293

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page