Skip to main content

A python package that enables user to build their custom singularity image on HPC cluster

Project description

Building a singular container for HPC using globus-compute

Context

  • One of the execution configurations of globus-compute requires a registered container which is spun up to execute the user function on the HPC.

  • HPCs do not run docker containers(due to security reasons as discussed here) and support only an apptainer/singularity image.

  • Installing the apptainer setup to build the singularity image locally is not a straightforward process especially on windows and mac systems as discussed in the documentation.

Using this python library the user can specify their custom image specification to build an apptainer/singularity image which would be used to in-turn to run their functions on globus-compute. The library registers the container and returns the container id which would be used by the globus-compute executor to execute the user function.

Prerequisite.

A globus-compute-endpoint setup on HPC cluster.

The following steps can be used to create an endpoint on the NCSA Delta Cluster, you can modify the configurations based on your use-case:

  1. Create a conda virtual env. We have created a custom-image-builder conda env on the delta cluster as follows:
conda create --name custom-image-builder python=3.11

conda activate custom-image-builder

pip install globus-compute-endpoint==2.2.0

pip install custom-image-builder
  1. Creating a globus-compute endpoint:
globus-compute-endpoint configure custom-image-builder

Update the endpoint config at ~/.globus_compute/custom-image-builder/config.py to :

from parsl.addresses import address_by_interface
from parsl.launchers import SrunLauncher
from parsl.providers import SlurmProvider

from globus_compute_endpoint.endpoint.utils.config import Config
from globus_compute_endpoint.executors import HighThroughputExecutor


user_opts = {
    'delta': {
        'worker_init': 'conda activate custom-image-builder',
        'scheduler_options': '#SBATCH --account=bbmi-delta-cpu',
    }
}

config = Config(
    executors=[
        HighThroughputExecutor(
            max_workers_per_node=10,
            address=address_by_interface('hsn0'),
            scheduler_mode='soft',
            worker_mode='singularity_reuse',
            container_type='singularity',
            container_cmd_options="",
            provider=SlurmProvider(
                partition='cpu',
                launcher=SrunLauncher(),

                # string to prepend to #SBATCH blocks in the submit
                # script to the scheduler eg: '#SBATCH --constraint=knl,quad,cache'
                scheduler_options=user_opts['delta']['scheduler_options'],
                worker_init=user_opts['delta']['worker_init'],
                # Command to be run before starting a worker, such as:
                # 'module load Anaconda; source activate parsl_env'.

                # Scale between 0-1 blocks with 2 nodes per block
                nodes_per_block=1,
                init_blocks=0,
                min_blocks=0,
                max_blocks=1,

                # Hold blocks for 30 minutes
                walltime='00:30:00'
            ),
        )
    ],
)
  1. Start the endpoint and store the endpoint id to be used in the following example
globus-compute-endpoint start custom-image-builder

Example

Consider the following use-case where the user wants to execute a pandas operation on HPC using globus-compute. They need a singularity image which would be used by the globus-compute executor. The library can be leveraged as follows:

Locally you need to install the following packages, you can create a virtual env as follows:

cd example/

python3.9 -m venv venv

source venv/bin/activate

pip install globus-compute-sdk==2.2.0

pip install custom-image-builder
from custom_image_builder import build_and_register_container
from globus_compute_sdk import Client, Executor


def transform():
    import pandas as pd
    data = {'Column1': [1, 2, 3],
            'Column2': [4, 5, 6]}

    df = pd.DataFrame(data)

    return "Successfully created df"


def main():
    image_builder_endpoint = "81b21a94-0e18-457d-98b5-05672a8a3b60"
    gcc_client = Client()

    container_id = build_and_register_container(gcc_client=gcc_client,
                                                endpoint_id=image_builder_endpoint,
                                                image_file_name="my-pandas-image",
                                                base_image_type="docker",
                                                base_image="python:3.8",
                                                pip_packages=["pandas"])

    print("The container id is", container_id)

    example_endpoint = "0b4e042b-edd5-4951-9ce5-6608c2ef6cb8"

    with Executor(endpoint_id=example_endpoint,
                  container_id="791a75b4-c2bd-40f1-85e0-ba17458c233b") as ex:
        fut = ex.submit(transform)

    print(fut.result())

Note.

For the following example to work we must use the globus-compute-sdk version of 2.2.0 while setting up our endpoint.

The singularity image require globus-compute-endpoint as one of its packages in-order to run the workers as our custom singularity container, hence by default we require python as part of the image inorder to install globus-compute-endpoint.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

custom_image_builder-0.1.4.tar.gz (6.1 kB view details)

Uploaded Source

Built Distribution

custom_image_builder-0.1.4-py3-none-any.whl (5.9 kB view details)

Uploaded Python 3

File details

Details for the file custom_image_builder-0.1.4.tar.gz.

File metadata

  • Download URL: custom_image_builder-0.1.4.tar.gz
  • Upload date:
  • Size: 6.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.11.5 Linux/6.2.0-1012-azure

File hashes

Hashes for custom_image_builder-0.1.4.tar.gz
Algorithm Hash digest
SHA256 41106b754ae90dcc35fbf7d1831d3ba10b85a074f1352c92170541b7103ee0c2
MD5 d77a1b02ab271cfd6aa63ab948f27d9b
BLAKE2b-256 8e9df1970e4815221c3e720e5487a0d2b5aebf0a3f72ecdc6259701e625f036f

See more details on using hashes here.

File details

Details for the file custom_image_builder-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for custom_image_builder-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 be001deb774cdbb2e53ef3356afe6d09b0ed90a0c23fe7a34c36e368c64017c1
MD5 77ca231c5396308d6468638958fc6b72
BLAKE2b-256 f2ed021b4aefe0f509b43419f3d97ae9cc956458ab9e548e20ea37bba888ad03

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page