Skip to main content

Snakemake-like pipeline manager for reproducible Jupyter Notebooks

Project description

Snakemake-like pipelines for Jupyter Notebooks

Install & general remarks

these are still early days of this software so please bear in mind that it is not ready for packaging and distribution yet. If you wish to continue and evaluate it as-is, please follow these steps:

Note: for simplicity I assume that you are using a recent Ubuntu with git installed.

pip install nbpipeline

Developement install

To install the latest developement version you may use:

git clone https://github.com/krassowski/nbpipeline
cd nbpipeline
pip install -r requirements.txt
ln -s $(pwd)/nbpipeline/nbpipeline.py ~/bin/nbpipeline

Quickstart

Create pipeline.py file with list of rules for your pipeline. For example:

from rules import NotebookRule

NotebookRule(
    'Extract protein data',  # a nice name for the step
    input={'protein_data_path': 'data/raw/Protein/data_from_wetlab.xlsx'},
    output={'output_path': 'data/clean/protein/levels.csv'},
    notebook='protein/Data_extraction.ipynb',
    group='Proteomics', # this is optional
)


NotebookRule(
    'Quality control and PCA on proteins',
    input={'protein_levels_path': 'data/clean/protein/levels.csv'},
    output={'qc_report_path': 'reports/proteins_failing_qc.csv'},
    notebook='protein/Exploration_and_quality_control.ipynb',
    group='Proteomics'
)

the keys of the input and output variables should correspond to variables in one of the first cells in the corresponding notebook, which should be tagged as “parameters”. You will be warned if your notebook has no cell tagged as “parameters”.

Run the pipeline:

nbpipepline

On any consecutive run the notebooks which did not change will not be run again. To disable this cache, use --disable_cache switch.

To generate an interactive diagram of the rules graph, together with reproducibility report add -i switch:

nbpipepline -i

The software defaults to google-chrome for graph visualization display, which can be changed with a CLI option.

If you named your definition files differently (e.g. my_rules.py instead of pipeline.py), use:

nbpipepline --definitions_file my_rules.py

To display all command line options use:

nbpipepline -h

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nbpipeline-0.1.2.tar.gz (11.9 kB view details)

Uploaded Source

File details

Details for the file nbpipeline-0.1.2.tar.gz.

File metadata

  • Download URL: nbpipeline-0.1.2.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.2

File hashes

Hashes for nbpipeline-0.1.2.tar.gz
Algorithm Hash digest
SHA256 8fd9e5a9b660528e331101c4a60de2f43e7f2f2c94a89d10fb952c5ea28a9490
MD5 08445b8dd865fd750de633e9e1026e6f
BLAKE2b-256 a345669c22a0891785116fea0a7beb485972d451e7e05c932665fbc9ed552322

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page