Submit, monitor and kill jobs on remote systems
Project description
Troika
Submit, monitor and kill jobs on local and remote hosts
Requirements
- Python 3.8 or higher
pyyaml
(https://pypi-hypernode.com/project/PyYAML/)- For testing:
pytest
(https://pypi-hypernode.com/project/pytest/) - For building the documentation:
sphinx
(https://www.sphinx-doc.org)
Installing
python3 -m venv troika
source troika/bin/activate
python3 -m pip install troika
Running the tests
Once Troika is installed in your environment, the tests can be run using pytest
:
python3 -m pytest -v tests/
Building documentation
The documentation uses sphinx
. To generate the HTML docs:
cd docs/
make html
Presentation
Slides and recording of the "Troika: Submit, monitor, and interrupt jobs on any HPC system with the same interface" talk at FOSDEM'23 are available via https://fosdem.org/2023/schedule/event/troika_hpc_jobs .
Getting started
Concepts
Troika holds a list of sites onto which jobs can be submitted. A site is
defined by two main parameters: a connection type (local
or ssh
), and a
site type (e.g. direct
or slurm
). Every site is identified by a name
given in the configuration file.
Example configuration file
---
sites:
localhost:
type: direct # jobs are run directly on the target
connection: local # the target is the current host
remote:
type: direct # jobs are run directly on the target
connection: ssh # connect to the target via ssh
host: remotebox # ssh host
copy_script: true # if false, the script will be piped through ssh
at_startup: ["check_connection"]
slurm_cluster:
type: slurm # jobs are submitted to Slurm
connection: ssh # connect to the target via ssh
host: remotecluster # ssh host
copy_script: true # if false, the script will be piped through ssh
at_startup: ["check_connection"]
pre_submit: ["create_output_dir"]
at_exit: ["copy_submit_logfile"]
pbs_cluster:
type: pbs # jobs are submitted to PBS
connection: ssh # connect to the target via ssh
host: othercluster # ssh host
copy_script: true # if false, the script will be piped through ssh
at_startup: ["check_connection"]
pre_submit: ["create_output_dir"]
at_exit: ["copy_submit_logfile"]
The configuration can be checked using the list-sites
command:
$ troika -c config.yml list-sites
Available sites:
Name Type Connection
------------------------------------------------------------
localhost direct local
remote direct ssh
slurm_cluster slurm ssh
pbs_cluster pbs ssh
Available options
$ troika --help
Main commands
Submit a job on cluster
:
$ troika -c config.yaml submit -o /path/to/output/file cluster job.sh
Query the status of the job:
$ troika -c config.yaml monitor cluster job.sh
Kill the job:
$ troika -c config.yaml kill cluster job.sh
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file troika-0.2.3.tar.gz
.
File metadata
- Download URL: troika-0.2.3.tar.gz
- Upload date:
- Size: 32.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3a400d1c37171e67be008e773d9fe4d5e63e063978d31affd24f01ec53f892fe |
|
MD5 | 5bd223ea37049f0d2fd794a81b4d9d14 |
|
BLAKE2b-256 | 8967f7de6049ac40eb97afbf413d9dff1a23e91470d07adfbd7882d7201b6d2e |
File details
Details for the file troika-0.2.3-py3-none-any.whl
.
File metadata
- Download URL: troika-0.2.3-py3-none-any.whl
- Upload date:
- Size: 44.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dfcca02e31af51c17af609cd4d08a7fb7cb2fd63239180ebc42fb8603b8640e7 |
|
MD5 | f33dd20e6d058281ae57fa913f0384ab |
|
BLAKE2b-256 | 675d8f6a204f7cdb89385b2985c91b2e59a87ba2f1da2eda014cb5b4c1eeff73 |