ServiceX Data Transformer for HEP Data
Project description
ServiceX_transformer Library
Library of common classes for building serviceX transformers.
Minimum Requiremnts
Works with Python version 2.7 and above
Download from PyPi
To use this library:
pip install servicex-transformer
Standard Command Line Arguments
This library provides a subclass of ArgParse for standardizing commnand line arguments for all transformer implementations.
Available arguments are:
Transformed Result Output
Command line arguments determine a destination for the results as well as an output format.
- Kafka - Streaming system. Write messages formatted as Arrow tables. The
chunks
parameter determines how many events are included in each message. - Object Store - Each transformed file is written as an object to an S3 compatible object store. The only currently supported output file format is parquet. The objects are stored in a bucket named after the transformation request ID.
Command Line Reference
Option | Description | Default |
---|---|---|
--brokerlist BROKERLIST | List of Kafka broker to connect to if streaming is selected | servicex-kafka-0.slateci.net:19092, servicex-kafka-1.slateci.net:19092, servicex-kafka-2.slateci.net:19092" |
--topic TOPIC | Kafka topic to publish arrays to | servicex |
--chunks CHUNKS | Number of events to include in each message. If ommitted, it will compute a best guess based on heuristics and max message size | None |
--tree TREE | Root Tree to extract data from. Only valid for uproot transformer | Events |
--path PATH | Path to single Root file to transform. Any file path readable by xrootd | |
--limit LIMIT | Max number of events to process | |
--result-destination DEST | Where to send the results: kafka or object-store, output-dir | kafka |
--output-dir | Local directory where the result will be written. Use this to run standalone without other serviceX infrastructure | None |
--result-format | Binary format for the results: arrow, parquet, or root-file | arrow |
--max-message-size | Maximum size for any message in Megabytes | 14.5 Mb |
--rabbit-uri URI | RabbitMQ Connection URI | host.docker.internal |
--request-id GUID | ID associated with this transformation request. Used as RabbitMQ Topic Name as well as object-store bucket | servicex |
--subdir SUBDIR | Subdirectory in the persistent volume to write result to |
Running Tests
Validation of the code logic is performed using
pytest and
pytest-mock. Unit test fixtures are
in test
directories inside each package.
The tests are instrumented with code coverage reporting via codecov. The travis job has a the codecov upload token set as an environment variable which is passed into the docker container so the report can be uploaded upon successful conclusion of the tests.
Coding Standards
To make it easier for multiple people to work on the codebase, we enforce PEP8
standards, verified by flake8. The community has found that the 80 character
limit is a bit awkward, so we have a local config setting the max_line_length
to 99.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file servicex-transformer-1.0.0.dev6.tar.gz
.
File metadata
- Download URL: servicex-transformer-1.0.0.dev6.tar.gz
- Upload date:
- Size: 13.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 897eacb3abaeccf582c557f6bebf932401e05e017e8c11db1fd4377618a5555e |
|
MD5 | 840cb72629cef07bcf0f55f5842bd4a8 |
|
BLAKE2b-256 | 22f27c38e21ad9caecdad9ef68e0b364d620439a7e376bc72d3db49d40f2c43d |
Provenance
File details
Details for the file servicex_transformer-1.0.0.dev6-py3-none-any.whl
.
File metadata
- Download URL: servicex_transformer-1.0.0.dev6-py3-none-any.whl
- Upload date:
- Size: 31.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3a711b6ce88486a881523570809c02b32052e2bb67ab01c9d0537cfffb5b3986 |
|
MD5 | 5ed1fe4cb77a6a7df79d914c04b681b2 |
|
BLAKE2b-256 | 7622bfa7dd9626753cfd6d8ac9482c10b1401aca0394b3c87bc7f69c9521da54 |