A python package that enables user to build their custom singularity image on HPC cluster
Project description
Building a singular container for HPC using globus-compute
Context
-
One of the execution configurations of globus-compute requires a registered container which is spun up to execute the user function on the HPC.
-
HPCs do not run docker containers(due to security reasons as discussed here) and support only an apptainer/singularity image.
-
Installing the apptainer setup to build the singularity image locally is not a straightforward process especially on windows and mac systems as discussed in the documentation.
Using this python library the user can specify their custom image specification to build an apptainer/singularity image which would be used to in-turn to run their functions on globus-compute. The library registers the container and returns the container id which would be used by the globus-compute executor to execute the user function.
Prerequisite.
A globus-compute-endpoint setup on HPC cluster.
The following steps can be used to create an endpoint on the NCSA Delta Cluster, you can modify the configurations based on your use-case:
Note.
For the following to work we must use the globus-compute-sdk version of 2.2.0 while setting up our endpoint. It is recommended to use python3.9 for setting up the endpoint and as the client
- Create a conda virtual env. We have created a
custom-image-builder
conda env on the delta cluster as follows:
conda create --name custom-image-builder-py-3.9 python=3.9
conda activate custom-image-builder
pip install globus-compute-endpoint==2.2.0
- Creating a globus-compute endpoint:
globus-compute-endpoint configure custom-image-builder
Update the endpoint config at ~/.globus_compute/custom-image-builder/config.py
to :
from parsl.addresses import address_by_interface
from parsl.launchers import SrunLauncher
from parsl.providers import SlurmProvider
from globus_compute_endpoint.endpoint.utils.config import Config
from globus_compute_endpoint.executors import HighThroughputExecutor
user_opts = {
'delta': {
'worker_init': 'conda activate custom-image-builder-py-3.9',
'scheduler_options': '#SBATCH --account=bbmi-delta-cpu',
}
}
config = Config(
executors=[
HighThroughputExecutor(
max_workers_per_node=10,
address=address_by_interface('hsn0'),
scheduler_mode='soft',
worker_mode='singularity_reuse',
container_type='singularity',
container_cmd_options="",
provider=SlurmProvider(
partition='cpu',
launcher=SrunLauncher(),
# string to prepend to #SBATCH blocks in the submit
# script to the scheduler eg: '#SBATCH --constraint=knl,quad,cache'
scheduler_options=user_opts['delta']['scheduler_options'],
worker_init=user_opts['delta']['worker_init'],
# Command to be run before starting a worker, such as:
# 'module load Anaconda; source activate parsl_env'.
# Scale between 0-1 blocks with 2 nodes per block
nodes_per_block=1,
init_blocks=0,
min_blocks=0,
max_blocks=1,
# Hold blocks for 30 minutes
walltime='00:30:00'
),
)
],
)
- Start the endpoint and store the endpoint id to be used in the following example
globus-compute-endpoint start custom-image-builder
Example
Consider the following use-case where the user wants to execute a pandas operation on HPC using globus-compute. They need a singularity image which would be used by the globus-compute executor. The library can be leveraged as follows:
Locally you need to install the following packages, you can create a virtual env as follows:
cd example/
python3.9 -m venv venv
source venv/bin/activate
pip install globus-compute-sdk==2.2.0
pip install custom-image-builder
from custom_image_builder import build_and_register_container
from globus_compute_sdk import Client, Executor
def transform():
import pandas as pd
data = {'Column1': [1, 2, 3],
'Column2': [4, 5, 6]}
df = pd.DataFrame(data)
return "Successfully created df"
def main():
image_builder_endpoint = "bc106b18-c8b2-45a3-aaf0-75eebc2bef80"
gcc_client = Client()
container_id = build_and_register_container(gcc_client=gcc_client,
endpoint_id=image_builder_endpoint,
image_file_name="my-pandas-image",
base_image_type="docker",
base_image="python:3.8",
pip_packages=["pandas"])
print("The container id is", container_id)
with Executor(endpoint_id=image_builder_endpoint,
container_id=container_id) as ex:
fut = ex.submit(transform)
print(fut.result())
Note.
The singularity image require globus-compute-endpoint as one of its packages in-order to run the workers as our custom singularity container, hence by default we require python as part of the image inorder to install globus-compute-endpoint.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file custom_image_builder-1.0.1.tar.gz
.
File metadata
- Download URL: custom_image_builder-1.0.1.tar.gz
- Upload date:
- Size: 6.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.2 CPython/3.11.5 Linux/6.2.0-1012-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f2601abf9b2aeb9abdbecf2e0a7bb94e133126fe47bcdb46704e9b12e84700d2 |
|
MD5 | 65713ea40d7a379e738dd9efcb1c63b2 |
|
BLAKE2b-256 | 7f6da29ea805478fc34f3b8e4bbdcb4996a22759fd55e3c2479ed12a7f41ef67 |
File details
Details for the file custom_image_builder-1.0.1-py3-none-any.whl
.
File metadata
- Download URL: custom_image_builder-1.0.1-py3-none-any.whl
- Upload date:
- Size: 6.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.2 CPython/3.11.5 Linux/6.2.0-1012-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ad0fe09165155a5c0c1604e0917c17aa2bcf0eb44d41b889d20593882d3c77c3 |
|
MD5 | 62de03144b16e69e1715aed22705ba7d |
|
BLAKE2b-256 | d96d5e969d01b7a3e2758b7cce8a42b68d927947e063e0fb48b8a6dfbca3ecd5 |