A general purpose python ETL/pipeline utility library, for use especially with Hive Streaming.
Project description
transformpy is a Python 2/3 module for doing transforms on “streams” of data. The transforms can be applied to any python iterable object, and so can be used for continuous real_time streams or static streams (such as from a file). It is designed in such a manner that it uses very little memory (unless necessary by clustering and/or aggregation routines). It was originally designed to allow python transformations (maps and reductions) of data stored within HIVE, using the Hadoop streaming paradigm.
NOTE: TransformPy is not guaranteed to be API stable before version 1.0; but changes should be small if any to the current version.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.