Skip to main content

A microframework to build source -> filter -> action workflows.

Project description

Badges

Documentation Status Release Notes Travis-CI Build Status Coverage Status Code Quality Status Scrtinizer Status
PyPI Package latest release PyPI Package monthly downloads PyPI Wheel Supported versions Supported imlementations

Simple rules

Python processor is a tool for creating chained pipelines for dataprocessing. It have very few key concepts:

Data object

Any python dict with two required fields: source and type.

Source

Any function which returns iterable sequence of data objects. See full list of sources in the docs.

Output

A function which accepts a data object as input and could output another. See full list of outputs in the docs. (or same) data object as result.

Predicate

Pipeline consists from sources outputs, but predicate decides which data object should be processed by which output.

Quick example

Here is example of pipeline which reads IMAP folder and sends all emails to Slack chat:

run_pipeline(
    sources=[sources.imap('imap.gmail.com'
                          'username',
                          'password'
                          'INBOX')],
    rules=[(for_any_message, [email_to_slack, outputs.slack(SLACK_URL)])])

Here you construct a pipeline, which uses sources.imap for reading imap folder “INBOX” of username@gmail.com. Function for_any_message is a predicate saying something like that: lambda data_object: True. In more complex case predicates could be used for routing dataobjects to different processors.

Functions email_to_slack and outputs.slack(SLACK_URL) are processors. First one is a simple function which accepts data object, returned by imap source and transforming it to the data object which could be used by slack.output. We need that because slack requires a different set of fields. Call to outputs.slack(SLACK_URL) returns a function which gets an object and send it to the specified Slack’s endpoint.

It is just example, for working snippets, continue reading this documention ;-)

Installation

Create a virtual environment with python3::

virtualenv --python=python3 env
source env/bin/activate

If you are on OSX, then install lxml on OSX separately::

STATIC_DEPS=true pip install lxml

Then install the processor::

pip install processor

Usage

Now create an executable python script, where you’ll place your pipline’s configuration. For example, this simple code creates a process line which searches new results in Twitter and outputs them to console. Of cause, you can output them not only to console, but also post by email, to Slack chat or everywhere else if there is an output for it:

#!env/bin/python3
import os
from processor import run_pipeline, sources, outputs
from twiggy_goodies.setup import setup_logging


for_any_message = lambda msg: True

def prepare(tweet):
    return {'text': tweet['text'],
            'from': tweet['user']['screen_name']}

setup_logging('twitter.log')

run_pipeline(
    sources=[sources.twitter.search(
        'My Company',
        consumer_key='***', consumer_secret='***',
        access_token='***', access_secret='***',
        )],
    rules=[(for_any_message, [prepare, outputs.debug()])])

Running this code, will fetch new results for search by query My Company and output them on the screen. Of course, you could use any other output, supported by the processor. Browse online documentation to find out which sources and outputs are supported and for to configure them.

Documentation

https://python-processor.readthedocs.org/

Development

To run the all tests run:

tox

Authors

Changelog

0.2.1 (2015-03-30)

Fixed error in import-or-error macro, which prevented from using 3-party libraries.

0.2.0 (2015-03-30)

Most 3-party libraries are optional now. If you want to use some extension which requires external library, it will issue an error and call sys.exit(1) until you satisfy this requirement.

This should make life easier for thouse, who does not want to use rss output which requires feedgen which requires lxml which is hard to build because it is C extension.

0.1.0 (2015-03-18)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

processor-0.2.1.tar.gz (23.9 kB view details)

Uploaded Source

Built Distribution

processor-0.2.1-py2.py3-none-any.whl (15.9 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file processor-0.2.1.tar.gz.

File metadata

  • Download URL: processor-0.2.1.tar.gz
  • Upload date:
  • Size: 23.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for processor-0.2.1.tar.gz
Algorithm Hash digest
SHA256 59dab356bb313a897dbdb172a90c5e0979e0f6240d52ffc3a3afa41164f33415
MD5 894140ce68f83800ef90818667c7c238
BLAKE2b-256 6751ce6ef138b3f200fae359b476e80beef789f63eec99f88e1903f084f31852

See more details on using hashes here.

File details

Details for the file processor-0.2.1-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for processor-0.2.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 5162d25088c5b860fc8b47bc7df199b6ab104d2829e499d9ce22c088c6a6cb35
MD5 971179ceba633614f183c0c92e91d9a9
BLAKE2b-256 19d96ccf071d96187cc315c695dbb7ca91f8d3d4778952c7c5abbb1dfb77254c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page