dataflows

A nifty data processing framework, based on data packages

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3.6
Topic
- Software Development :: Libraries :: Python Modules

Project description

# ![logo](logo-s.png) DataFlows

[![Travis](https://img.shields.io/travis/datahq/dataflows/master.svg)](https://travis-ci.org/datahq/dataflows)
[![Coveralls](http://img.shields.io/coveralls/datahq/dataflows.svg?branch=master)](https://coveralls.io/r/datahq/dataflows?branch=master)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/dataflows.svg)
[![Gitter chat](https://badges.gitter.im/dataflows-chat/Lobby.png)](https://gitter.im/dataflows-chat/Lobby)

DataFlows is a simple and intuitive way of building data processing flows.

- It's built for small-to-medium-data processing - data that fits on your hard drive, but is too big to load in Excel or as-is into Python, and not big enough to require spinning up a Hadoop cluster...
- It's built upon the foundation of the Frictionless Data project - which means that all data produced by these flows is easily reusable by others.
- It's a pattern not a heavy-weight framework: if you already have a bunch of download and extract scripts this will be a natural fit

Read more in the [Features section below][#features].

## QuickStart

Install `dataflows` via `pip install.`

Then use the command-line interface to bootstrap a basic processing script for any remote data file:

```bash

# Install from PyPi
$ pip install dataflows

# Inspect a remote CSV file
$ dataflows init https://raw.githubusercontent.com/datahq/dataflows/master/data/academy.csv
Writing processing code into academy_csv.py
Running academy_csv.py
academy:
# Year Ceremony Award Winner Name Film
(string) (integer) (string) (string) (string) (string)
---- ---------- ----------- -------------------------------- ---------- ------------------------------ -------------------
1 1927/1928 1 Actor Richard Barthelmess The Noose
2 1927/1928 1 Actor 1 Emil Jannings The Last Command
3 1927/1928 1 Actress Louise Dresser A Ship Comes In
4 1927/1928 1 Actress 1 Janet Gaynor 7th Heaven
5 1927/1928 1 Actress Gloria Swanson Sadie Thompson
6 1927/1928 1 Art Direction Rochus Gliese Sunrise
7 1927/1928 1 Art Direction 1 William Cameron Menzies The Dove; Tempest
...

# dataflows create a local package of the data and a reusable processing script which you can tinker with
$ tree
.
├── academy_csv
│ ├── academy.csv
│ └── datapackage.json
└── academy_csv.py

1 directory, 3 files

# Resulting 'Data Package' is super easy to use in Python
[adam] ~/code/budgetkey-apps/budgetkey-app-main-page/tmp (master=) $ python
Python 3.6.1 (default, Mar 27 2017, 00:25:54)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from datapackage import Package
>>> pkg = Package('academy_csv/datapackage.json')
>>> it = pkg.resources[0].iter(keyed=True)
>>> next(it)
{'Year': '1927/1928', 'Ceremony': 1, 'Award': 'Actor', 'Winner': None, 'Name': 'Richard Barthelmess', 'Film': 'The Noose'}
>>> next(it)
{'Year': '1927/1928', 'Ceremony': 1, 'Award': 'Actor', 'Winner': '1', 'Name': 'Emil Jannings', 'Film': 'The Last Command'}

# You now run `academy_csv.py` to repeat the process
# And obviously modify it to add data modification steps
```

## Features

* Trivial to get started and easy to scale up
* Set up and run from command line in seconds ...
* `dataflow init` => `flow.py`
* `python flow.py`
* Validate input (and esp source) quickly (non-zero length, right structure, etc.)
* Supports caching data from source and even between steps
* so that we can run and test quickly (retrieving is slow)
* Immediate test is run: and look at output ...
* Log, debug, rerun
* Degrades to simple python
* Conventions over configuration
* Log exceptions and / or terminate
* The input to each stage is a Data Package or Data Resource (not a previous task)
* Data package based and compatible
* Processors can be a function (or a class) processing row-by-row, resource-by-resource or a full package
* A pre-existing decent contrib library of Readers (Collectors) and Processors and Writers

## Learn more

Dive into the [Tutorial](TUTORIAL.md) to get a deeper glimpse into everything that `dataflows` can do.
Also review this list of [Built-in Processors](PROCESSORS.md), which also includes an API reference for each one of them.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3.6
Topic
- Software Development :: Libraries :: Python Modules

Release history Release notifications | RSS feed

0.5.5

Apr 1, 2024

0.5.4

Mar 22, 2024

0.5.3

Mar 22, 2024

0.5.2

Mar 22, 2024

0.5.1

Mar 22, 2024

0.5.0

Mar 20, 2024

0.4.14

Mar 13, 2024

0.4.12

Mar 13, 2024

0.4.11

Mar 13, 2024

0.4.10

Mar 13, 2024

0.4.9

Mar 12, 2024

0.4.8

Mar 12, 2024

0.4.7

Mar 12, 2024

0.4.5

Oct 11, 2023

0.4.3

Sep 26, 2023

0.4.2

Sep 26, 2023

0.4.1

Sep 26, 2023

0.4.0

Jul 19, 2023

0.3.23

Jul 18, 2023

0.3.22

Apr 17, 2023

0.3.20

Feb 21, 2023

0.3.19

Feb 20, 2023

0.3.18

Feb 20, 2023

0.3.16

Aug 18, 2022

0.3.15

Jul 31, 2022

0.3.14

Jul 26, 2022

0.3.13

Jul 4, 2022

0.3.12

May 29, 2022

0.3.11

Jan 26, 2022

0.3.8

Oct 18, 2021

0.3.7

Oct 17, 2021

0.3.6

Oct 17, 2021

0.3.4

Oct 6, 2021

0.3.3

Sep 30, 2021

0.3.2

Sep 24, 2021

0.3.1

Aug 23, 2021

0.3.0

Aug 22, 2021

0.2.18

Aug 4, 2021

0.2.17

May 31, 2021

0.2.16

May 15, 2021

0.2.15

May 14, 2021

0.2.14

May 14, 2021

0.2.13

May 3, 2021

0.2.12

Apr 12, 2021

0.2.11

Apr 7, 2021

0.2.10

Apr 6, 2021

0.2.9

Mar 27, 2021

0.2.8

Mar 21, 2021

0.2.7

Mar 15, 2021

0.2.5

Feb 17, 2021

0.2.4

Feb 17, 2021

0.2.3

Feb 17, 2021

0.2.2

Dec 22, 2020

0.2.1

Dec 6, 2020

0.2.0

Nov 23, 2020

0.1.15

Nov 17, 2020

0.1.14

Nov 17, 2020

0.1.13

Nov 8, 2020

0.1.12

Nov 7, 2020

0.1.11

Nov 5, 2020

0.1.10

Oct 20, 2020

0.1.9

Oct 16, 2020

0.1.8

Oct 11, 2020

0.1.7

Oct 7, 2020

0.1.6

Aug 23, 2020

0.1.5

Aug 11, 2020

0.1.4

Jul 30, 2020

0.1.3

Jul 29, 2020

0.1.2

Jun 21, 2020

0.1.1

Jun 13, 2020

0.1.0

May 26, 2020

0.0.74

May 25, 2020

0.0.73

May 25, 2020

0.0.72

May 15, 2020

0.0.71

Feb 20, 2020

0.0.68

Feb 5, 2020

0.0.67

Jan 19, 2020

0.0.66

Jan 14, 2020

0.0.65

Dec 26, 2019

0.0.64

Nov 17, 2019

0.0.63

Oct 8, 2019

0.0.62

Oct 7, 2019

0.0.60

Oct 3, 2019

0.0.59

Oct 3, 2019

0.0.58

Sep 2, 2019

0.0.57

Jul 2, 2019

0.0.56

Jun 16, 2019

0.0.55

May 27, 2019

0.0.54

May 27, 2019

0.0.53

May 23, 2019

0.0.52

May 13, 2019

0.0.51

May 2, 2019

0.0.50

Apr 28, 2019

0.0.49

Apr 28, 2019

0.0.48

Apr 6, 2019

0.0.47

Apr 5, 2019

0.0.46

Mar 30, 2019

0.0.45

Mar 25, 2019

0.0.44

Mar 9, 2019

0.0.43

Mar 9, 2019

This version

0.0.42

Mar 9, 2019

0.0.39

Jan 20, 2019

0.0.38

Jan 13, 2019

0.0.37

Nov 27, 2018

0.0.36

Nov 26, 2018

0.0.35

Nov 22, 2018

0.0.34

Nov 22, 2018

0.0.33

Nov 18, 2018

0.0.32

Oct 29, 2018

0.0.31

Oct 21, 2018

0.0.30

Oct 19, 2018

0.0.29

Oct 18, 2018

0.0.28

Oct 17, 2018

0.0.27

Oct 17, 2018

0.0.26

Oct 17, 2018

0.0.25

Oct 17, 2018

0.0.24

Oct 17, 2018

0.0.23

Oct 16, 2018

0.0.22

Oct 16, 2018

0.0.21

Oct 16, 2018

0.0.20

Oct 15, 2018

0.0.19

Oct 10, 2018

0.0.18

Oct 10, 2018

0.0.17

Oct 10, 2018

0.0.16

Oct 10, 2018

0.0.15

Oct 9, 2018

0.0.14

Oct 8, 2018

0.0.13

Oct 7, 2018

0.0.12

Oct 3, 2018

0.0.11

Oct 3, 2018

0.0.10

Sep 13, 2018

0.0.9

Sep 13, 2018

0.0.8

Sep 8, 2018

0.0.7

Aug 1, 2018

0.0.6

Jul 12, 2018

0.0.5

Jul 7, 2018

0.0.4

Jul 7, 2018

0.0.3

Jun 27, 2018

0.0.2

Jun 20, 2018

0.0.1

Jun 7, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataflows-0.0.42.tar.gz (28.8 kB view hashes)

Uploaded Mar 9, 2019 Source

Hashes for dataflows-0.0.42.tar.gz

Hashes for dataflows-0.0.42.tar.gz
Algorithm	Hash digest
SHA256	`6cd530ba6e8ef86dd74458b96b26d543fd21d24b180bf124fcd68633b6e2e6f5`
MD5	`33b7e2b34b6322c42312006c90fde9c1`
BLAKE2b-256	`4351d735e30cd83b2cdcd1fe416a97728b956ee292bbb4b6c9b48f2e4bb1c06e`