Skip to main content

No project description provided

Project description

"Staging" for Snakemake

This package provides a mechanism for Snakemake workflows to explicitly "stage out" the output files from certain rules to a public repository like Zenodo to allow faster re-execution of the workflow, using these previously generated artifacts. This can be especially useful for workflows with computationally expensive rules that don't need to be frequently re-run.

snakemake-staging is a spin-off of the showyourwork project, which provides a "caching" framework for Snakemake workflows, to transparently avoid re-execution of rules that have been cached to Zenodo. The implementation of this logic in showyourwork is, however, somewhat fragile and unpredictable. In snakemake-staging, we take a more explicit approach, where "staged" rules are always either explicitly executed or restored.

Installation

To use snakemake-staging in your workflow, you can install it using pip (it's probably best to set up your Snakemake installation following the Snakemake docs first):

python -m pip install snakemake-staging

Quickstart

The Snakefile

While testing, it's probably best to use the Zenodo Sandbox, rather than the main site, since any archive published to Zenodo is permanent. To use the sandbox, you'll need a personal access token stored in the SANDBOX_TOKEN environment variable. You can generate a new token here.

Once you've added this token to your environment, you can edit the Snakefile for your workflow to use snakemake-staging as follows. First, towards the top of your Snakefile, add:

import snakemake_staging as staging

stage = staging.ZenodoStage(
    "zenodo-stage",
    config.get("restore", False)
)

to create a new stage called zenodo-stage. Note that here we're extracting a restore flag from the Snakemake config, which will be used to determine whether to restore files for the stage. This means that you can control the behavior of this stage from the command line. By passing --config restore=True to the snakemake command line interface, all files staged out by the zenodo-stage stage will be restored from the archive rather than generated.

Then, to stage out a rule, you can apply the stage as follows:

rule expensive:
    input:
        ...
    output:
        stage(
            "path/to/output1.txt",
            "path/to/output2.txt",
        )
    shell:
        ...

Finally, after defining all the rules that you want to stage out, you must add the following include which defines all the staging rules:

include: staging.snakefile()

At this point, here's the full Snakefile:

Full Snakefile example
import snakemake_staging as staging

stage = staging.ZenodoStage(
    "zenodo-stage",
    config.get("restore", False)
)

rule expensive:
    input:
        ...
    output:
        stage(
            "path/to/output1.txt",
            "path/to/output2.txt",
        )
    shell:
        ...

include: staging.snakefile()

Usage

With the Snakefile defined in the previous section, you can now run your workflow in 3 ways:

  1. Normal execution: If you run something like snakemake path/to/output1.txt (where I have omitted the usual --cores and --conda arguments) will execute the workflow as normal, without staging out any files.

  2. Stage upload: If you instead have Snakemake target the zenodo-stage.zenodo.json file (this filename can be changed by passing the info_file argument to the ZenodoStage constructor), the expensive rule will be executed, and the outputs will be uploaded to Zenodo, saving the record information to zenodo-stage.zenodo.json.

  3. Stage restore: Finally, after these outputs have been uploaded to Zenodo, you can call Snakemake --config restore=True to disable the expensive rule, and force the outputs to be restored from Zenodo.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snakemake_staging-0.0.1.tar.gz (13.0 kB view details)

Uploaded Source

Built Distribution

snakemake_staging-0.0.1-py3-none-any.whl (15.4 kB view details)

Uploaded Python 3

File details

Details for the file snakemake_staging-0.0.1.tar.gz.

File metadata

  • Download URL: snakemake_staging-0.0.1.tar.gz
  • Upload date:
  • Size: 13.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.3

File hashes

Hashes for snakemake_staging-0.0.1.tar.gz
Algorithm Hash digest
SHA256 d2e8ba26880a8f2dea02c3f52dd63af6e5a63255cb51a249a95643854873fc36
MD5 acd9be7a98220281261d5f42ffa42b24
BLAKE2b-256 476e140ac7d0710ffcd6fc6d6ffd4c2adeae424ff632ffefafb2dd36ab26f91d

See more details on using hashes here.

File details

Details for the file snakemake_staging-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for snakemake_staging-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d5b8da856fe0e48bbce722816e0132ecba39abf054445fabc94456f4d0695f5e
MD5 b7f0a823b0c9f0ae3b11633d154fbdbd
BLAKE2b-256 99c9289270b1a61d5492ba1ae164e3239c11539fa9be21611f5639e316fc64de

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page