miniwdl AWS backend (Batch+EFS)
Project description
miniwdl AWS plugin
Extends miniwdl to run workflows on AWS Batch and EFS
This miniwdl plugin enables it to submit AWS Batch jobs to execute WDL tasks. It uses EFS for work-in-progress file I/O, with optional S3 rails for workflow-level I/O.
Use within Amazon SageMaker Studio
Start with the companion miniwdl-aws-studio recipe to install miniwdl for interactive use within Amazon SageMaker Studio, a web IDE with a terminal and filesystem browser. You can use the terminal to operate miniwdl run
against AWS Batch, the filesystem browser to manage the inputs and outputs on EFS, and the Jupyter notebooks to further analyze the outputs.
That's the best way to try miniwdl-aws and get familiar with how it works with Batch and EFS. Read on for non-interactive deployment, which is a bit more complicated.
Unattended operations
For non-interactive use, a command-line wrapper miniwdl-aws-submit
launches miniwdl in its own small Batch job to orchestrate the workflow. This workflow job then spawns task jobs as needed, without needing the submitting computer (e.g. your laptop) to remain connected for the duration. Separate Batch compute environments handle workflow & task jobs, using lightweight Fargate resources for workflow jobs. (See below for detailed infra specs.)
Submitting workflow jobs
First pip3 install miniwdl-aws
locally to make the miniwdl-aws-submit
program available. The following example launches a viral genome assembly that should run in 10-15 minutes:
miniwdl-aws-submit \
https://github.com/broadinstitute/viral-pipelines/raw/v2.1.28.0/pipes/WDL/workflows/assemble_refbased.wdl \
reads_unmapped_bams=https://github.com/broadinstitute/viral-pipelines/raw/v2.1.19.0/test/input/G5012.3.testreads.bam \
reference_fasta=https://github.com/broadinstitute/viral-pipelines/raw/v2.1.19.0/test/input/ebov-makona.fasta \
sample_name=G5012.3 \
--workflow-queue miniwdl_workflow \
--task-queue miniwdl_task \
--fsap fsap-xxxx \
--s3upload s3://MY_BUCKET/assemble_refbased_test \
--follow
The command line resembles miniwdl run
's with extra AWS-related arguments:
command-line argument | equivalent environment variable | |
---|---|---|
--workflow-queue |
MINIWDL__AWS__WORKFLOW_QUEUE |
Batch job queue on which to schedule the workflow job |
--task-queue |
MINIWDL__AWS__TASK_QUEUE |
Batch job queue on which to schedule task jobs |
--fsap |
MINIWDL__AWS__FSAP |
EFS Access Point ID, which workflow and task jobs will mount at /mnt/efs |
--s3upload |
(optional) S3 URI prefix under which to upload the workflow products, including the log and output files |
Adding --wait
makes the tool await the workflow job's success or failure, reproducing miniwdl's exit code. --follow
does the same and also live-streams the workflow log. Without --wait
or --follow
, the tool displays the workflow job UUID and exits immediately.
Arguments not consumed by miniwdl-aws-submit
are passed through to miniwdl run
inside the workflow job; as are environment variables whose names begin with MINIWDL__
, allowing override of any miniwdl configuration option (disable wih --no-env
). See miniwdl_aws.cfg for various options preconfigured in the workflow job container.
Run directories on EFS
Miniwdl runs the workflow in a directory beneath /mnt/efs/miniwdl_run
(override with --dir
). The outputs also remain cached there for potential reuse in future runs.
Given the EFS-centric I/O model, you'll need a way to manage the filesystem contents remotely. Deploy an instance or container mounting your EFS, to access via SSH or web app (e.g. JupyterHub, Cloud Commander, VS Code server).
You can also automate cleanup of EFS run directories by setting miniwdl-aws-submit --s3upload
and:
--delete-after success
to delete the run directory immediately after successful output upload--delete-after failure
to delete the directory after failure--delete-after always
to delete it in either case- (or set environment variable
MINIWDL__AWS__DELETE_AFTER_S3_UPLOAD
)
Deleting a run directory after success prevents the outputs from being reused in future runs. Deleting it after failures can make debugging more difficult (although logs are retained, see below).
Logs & troubleshooting
If the terminal log isn't available (through Studio or miniwdl_submit_awsbatch --follow
) to trace a workflow failure, look for miniwdl's usual log files written in the run directory on EFS.
Each task job's log is also forwarded to CloudWatch Logs under the /aws/batch/job
group and a log stream name reported in miniwdl's log. Using miniwdl_submit_awsbatch
, the workflow job's log is also forwarded. CloudWatch Logs indexes the logs for structured search through the AWS Console & API.
Misconfigured infrastructure might prevent logs from being written to EFS or CloudWatch at all. In that case, use the AWS Batch console/API to find status messages for the workflow or task jobs.
Appendix: expected AWS infrastructure
Requirements:
- VPC in desired region
- EFS + Access Point providing uid=0 access
- set
--fsap
orMINIWDL__AWS__FSAP
to access point ID (fsap-xxxx)
- set
- Task execution:
- Batch Compute Environment
- spot instances (including
AmazonEC2SpotFleetTaggingRole
) - instance role
service-role/AmazonEC2ContainerServiceforEC2Role
AmazonElasticFileSystemClientReadWriteAccess
AmazonEC2ContainerRegistryReadOnly
s3:Get*
,s3:List*
on any desired S3 bucketss3:Put*
on S3 bucket(s) for--s3upload
- spot instances (including
- Batch Job Queue connected to compute environment
- set
--task-queue
orMINIWDL__AWS__TASK_QUEUE
to queue name
- set
- Batch Compute Environment
- Unattended workflow execution with
miniwdl_submit_awsbatch
:- Batch Compute Environment
- Fargate resources
- execution role
service-role/AmazonECSTaskExecutionRolePolicy
AWSBatchFullAccess
AmazonElasticFileSystemFullAccess
AmazonEC2ContainerRegistryPowerUser
s3:Get*
,s3:List*
on any desired S3 bucketss3:Put*
on S3 bucket(s) for--s3upload
- Batch Job Queue connected to compute environment
- tag the queue with
WorkflowEngineRoleArn
set to the execution role's ARN - set
--workflow-queue
orMINIWDL__AWS__WORKFLOW_QUEUE
to queue name
- tag the queue with
- Batch Compute Environment
Recommendations:
- Compute environments
- Modify spot instance launch template to relocate docker operations onto NVMe or an autoscaling volume instead of the root EBS volume
- Set task environment allocation strategy to
SPOT_CAPACITY_OPTIMIZED
and spot bid policies as needed - Set task environment spot
bidPercentage
as needed - Temporarily set task environment
minvCpus
to reduce scheduling delays during periods of interactive use
- EFS
- Create in One Zone mode, and restrict compute environments to the same zone (trading off availability for cost)
- Create in Max I/O mode
- Configure EFS Access Point to use a nonzero user ID, with an owned filesystem root directory
- Temporarily provision throughput if starting an intensive workload without much data already stored
- Use non-default VPC security group for EFS & compute environments
- EFS must be accessible to all containers through TCP port 2049
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file miniwdl-aws-0.0.2.tar.gz
.
File metadata
- Download URL: miniwdl-aws-0.0.2.tar.gz
- Upload date:
- Size: 25.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.0 requests/2.23.0 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.9.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 66fdd7cc4c11e0f495f79cbb73b6dcba047146ad267283cad779088204d5ea1f |
|
MD5 | bc84ff9c4f645dda1e8a6b6cd2513f12 |
|
BLAKE2b-256 | adbc720b955ddd109eded75ddaebf05fd6eb1633c1bf4b15c86f5fe2000c1084 |