Skip to main content

offline-processing and pipeline managment for HERA data analysis

Project description

hera_opm

Run Tests Code Coverage License Code style: black

hera_opm provides a convenient and flexible framework for developing data analysis pipelines for operating on HERA data. It facilitates "offline processing", and is portable enough to operate on computer clusters with batch submission systems or on local machines.

How It Works

The hera_opm package uses the makeflow system, which is a part of the Cooperative Computing Tools package developed by the Cooperative Computing Lab. The hera_opm package essentially converts a pipeline defined in a configuration file into a format that can be parsed by makeflow. This process is also aware of aspects specific to HERA data, such as the polarization features of the data, in order to build an appropriate software pipeline. Once the makeflow instructions file has been generated, the makeflow program itself is used to execute the steps in the pipeline.

There are generally 5 steps required to "build a pipeline":

  1. Write task scripts that will be executed by makeflow for a given stage in the pipeline. These scripts should generally be as atomic as possible, and perform only a single logical component of a pipeline (though it may in turn call several supporting scripts or commands).
  2. Write a configuration file which defines the order of tasks to be completed. This configuration file defines the logical flow of the pipeline, as well as prerequisites for each task. It also allows for defining compute and memory requirements, for systems that support resource management.
  3. Use the provided build_makeflow_from_config.py script to build a makeflow instruction file that specifies the pipeline tasks applied to the data files.
  4. Use the provided makeflow_nrao.sh or makeflow_local.sh to execute the pipeline in either the NRAO batch scheduler environment, or on a local machine, respectively.
  5. (Optional) Use the provided clean_up_makeflow.py to clean up the work directory for makeflow. This will remove the wrapper scripts and output files, and generate a single log file for all jobs in the makeflow.

Installation

To install the hera_opm package, simply:

pip install .

As mentioned above, hera_opm uses makeflow as the backing pipeline management software. As such, makeflow must be installed. To install makeflow in your home directory:

git clone https://github.com/cooperative-computing-lab/cctools.git
cd cctools
./configure --prefix=${HOME}/cctools
make clean
make install
export PATH=${PATH}:${HOME}/cctools/bin

For convenience, it is helpful to add the export statement to your .bashrc file, so that the makeflow commands are always on your PATH.

Dependencies

When installing the package, setuptools will attempt to download and install any missing dependencies. If you prefer to manage your own python environment (through conda or pip or some other manager), you can install them yourself.

Required

  • toml >= 0.9.4

Optional

Generating an lstbin pipeline (instead of analysis) requires that hera_cal be installed. The main package and tests can be run without this requirement.

Task Scripts and Config Files

For documentation on building task scripts, see the task scipts docs page. For documentation on config files, see the config file docs page.

Testing

hera_opm uses pytest as its testing framework. To run the test suite, do:

pytest

from the root repo directory. This may require running pip install .[test] to install testing dependencies.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hera_opm-1.2.1.tar.gz (7.3 MB view details)

Uploaded Source

Built Distribution

hera_opm-1.2.1-py3-none-any.whl (7.3 MB view details)

Uploaded Python 3

File details

Details for the file hera_opm-1.2.1.tar.gz.

File metadata

  • Download URL: hera_opm-1.2.1.tar.gz
  • Upload date:
  • Size: 7.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for hera_opm-1.2.1.tar.gz
Algorithm Hash digest
SHA256 23d4f3f31090ac822560390c037da8ae558fd537b27a08277f3a6d89d77e71c4
MD5 94d9fb2738a3ada21b24d5a32fdd275e
BLAKE2b-256 3d2485eb1a73a82c0d8765758742193963412fdc4217dfbcd72e4c068c12358a

See more details on using hashes here.

File details

Details for the file hera_opm-1.2.1-py3-none-any.whl.

File metadata

  • Download URL: hera_opm-1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 7.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for hera_opm-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b81c89bb4878c074bf161d44e5258012da55012b3e43c4ff19fc0a7981c2d9a8
MD5 8bdf6d6b406eb26b302afa6249efda6c
BLAKE2b-256 925f8d26d19ee10a7dd822333cfdf171751283fe010fa236a8d7fcaf4e2ee61b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page