Skip to main content

A command line client for the Global Pathogen Analysis Service

Project description

Tests PyPI version

The command line client for interacting with the Global Pathogen Analysis Service. Tested with Ubuntu Linux, MacOS, and Windows via Windows Subsystem for Linux (WSL2). The client uses parallelisation and asynchronous requests for fast client-side decontamination and upload, and automatically renames downloaded output files with original sample identifiers for convenience while preserving privacy.

Upload CLI demo

Command line interface Python API
gpas upload lib.Batch(upload_csv, token).upload()
gpas download lib.download_async()
gpas validate validation.validate()
gpas status lib.fetch_status(), lib.fetch_status_async()

Install

Installation using Conda or Miniconda is recommended (Miniconda installation guide). If using a modern Mac with Apple silicon, one will need to install conda and gpas-cli inside a Rosetta Terminal. One can alternatively pip install the PyPI package and manually install the samtools and readItAndKeep binary dependencies.

Upload functionality (mirroring gpas upload) is also available as a static binary for each release, but offers slower performance than the Python package.

With conda (recommended)

# Install inside a new Conda environment
curl https://raw.githubusercontent.com/GlobalPathogenAnalysisService/gpas-cli/main/environment.yml --output environment.yml
conda env create -f environment.yml

# Activate and use
conda activate gpas-cli
gpas --version

With pip

This requires separate installation of Samtools and read-it-and-keep. Requires Python 3.10+

# Install inside a new Python environment
python3 -m venv gpas-cli
source gpas-cli/bin/activate
pip install gpas

# Activate and use
source gpas-cli/bin/activate
gpas --version

# If samtools and read-it-and-keep are not in $PATH, tell gpas-cli where to find them
export GPAS_SAMTOOLS_PATH=path/to/samtools
export GPAS_READITANDKEEP_PATH=path/to/readItAndKeep

Authentication

Most gpas-cli actions require a valid API token (token.json). This can be saved using the 'Get API token' button on the 'Upload Client' page of the GPAS portal. If you can't see this button, please ask the team to enable it for you. If you'd like to try GPAS, please get in touch!

Command line usage

gpas validate

Validates an upload_csv and checks that the fastq or bam files it references exist.

gpas validate large-nanopore-fastq.csv

# Validate supplied tags
gpas validate --environment dev --token token.json large-nanopore-fastq.csv
% gpas validate -h
usage: gpas validate [-h] [--token TOKEN] [--environment {dev,staging,prod}] [--json-messages] upload_csv

Validate an upload CSV. Validates tags remotely if supplied with an authentication token

positional arguments:
  upload_csv            Path of upload CSV

options:
  -h, --help            show this help message and exit
  --token TOKEN         Path of auth token available from GPAS Portal
                        (default: None)
  --environment {dev,staging,prod}
                        GPAS environment to use
                        (default: prod)
  --json-messages       Emit JSON to stdout
                        (default: False)

gpas upload

Validates, decontaminates and upload reads specified in upload_csv to the specified GPAS environment

Upload CLI demo

gpas upload --environment dev --token token.json large-illumina-bam.csv

# Dry run; skip submission
gpas upload --dry-run --environment dev --token token.json large-illumina-bam.csv

# Offline mode; quit after decontamination
gpas upload tests/test-data/large-nanopore-fastq.csv
% gpas upload -h
usage: gpas upload [-h] [--token TOKEN] [--working-dir WORKING_DIR] [--out-dir OUT_DIR] [--processes PROCESSES] [--dry-run]
                   [--debug] [--environment {dev,staging,prod}] [--json-messages]
                   upload_csv

Validate, decontaminate and upload reads to the GPAS platform

positional arguments:
  upload_csv            Path of upload csv

options:
  -h, --help            show this help message and exit
  --token TOKEN         Path of auth token available from GPAS Portal
                        (default: None)
  --working-dir WORKING_DIR
                        Path of directory in which to make intermediate files
                        (default: /tmp)
  --out-dir OUT_DIR     Path of directory in which to save mapping CSV
                        (default: .)
  --processes PROCESSES
                        Number of tasks to execute in parallel. 0 = auto
                        (default: 0)
  --dry-run             Exit before submitting files
                        (default: False)
  --debug               Emit verbose debug messages
                        (default: False)
  --environment {dev,staging,prod}
                        GPAS environment to use
                        (default: prod)
  --json-messages       Emit JSON to stdout
                        (default: False)

gpas download

Downloads json, fasta, vcf and bam outputs from the GPAS platform by passing either a mapping_csv generated during batch upload, or a comma-separated list of sample guids. By passing both --mapping-csv and --rename, output files are saved using local sample names without the platform's knowledge.

Download CLI demo

# Download and rename BAMs for a previous upload
gpas download --rename --mapping-csv C-a06cbab8.mapping.csv --file-types bam token.json

# Download all outputs for a single guid
gpas download --guids 6e024eb1-432c-4b1b-8f57-3911fe87555f --file-types json,vcf,bam,fasta token.json
% gpas download -h
usage: gpas download [-h] [--mapping-csv MAPPING_CSV] [--guids GUIDS] [--file-types FILE_TYPES] [--out-dir OUT_DIR] [--rename]
                     [--debug] [--environment {dev,staging,prod}]
                     token

Download analytical outputs from the GPAS platform for given a mapping csv or list of guids

positional arguments:
  token                 Path of auth token (available from GPAS Portal)

options:
  -h, --help            show this help message and exit
  --mapping-csv MAPPING_CSV
                        Path of mapping CSV generated at upload time
                        (default: None)
  --guids GUIDS         Comma-separated list of GPAS sample guids
                        (default: )
  --file-types FILE_TYPES
                        Comma separated list of outputs to download (json,fasta,bam,vcf)
                        (default: fasta)
  --out-dir OUT_DIR     Path of output directory
                        (default: /Users/bede/Research/Git/gpas-cli)
  --rename              Rename outputs using local sample names (requires --mapping-csv)
                        (default: False)
  --debug               Emit verbose debug messages
                        (default: False)
  --environment {dev,staging,prod}
                        GPAS environment to use
                        (default: prod)

gpas status

Check the processing status of an uploaded batch by passing either a mapping_csv generated at upload time, or a comma-separated list of sample guids.

gpas status --mapping-csv example_mapping.csv --environment dev token.json
gpas status --guids 6e024eb1-432c-4b1b-8f57-3911fe87555f --format json token.json
% gpas status -h
usage: gpas status [-h] [--mapping-csv MAPPING_CSV] [--guids GUIDS] [--format {table,csv,json}] [--rename] [--raw]
                   [--environment {dev,staging,prod}]
                   token

Check the status of samples submitted to the GPAS platform

positional arguments:
  token                 Path of auth token available from GPAS Portal

options:
  -h, --help            show this help message and exit
  --mapping-csv MAPPING_CSV
                        Path of mapping CSV generated at upload time
                        (default: None)
  --guids GUIDS         Comma-separated list of GPAS sample guids
                        (default: )
  --format {table,csv,json}
                        Output format
                        (default: table)
  --rename              Use local sample names (requires --mapping-csv)
                        (default: False)
  --raw                 Emit raw response
                        (default: False)
  --environment {dev,staging,prod}
                        GPAS environment to use
                        (default: prod)

Development and testing

Use pre-commit to apply black style at commit time (should happen automatically)

git clone https://github.com/GlobalPathogenAnalysisService/gpas-cli
conda env create -f environment-dev.yml
conda activate gpas-cli-dev
cd gpas-cli
pip install --upgrade --force-reinstall --editable ./

# Offline unit tests
pytest tests/test_gpas.py

# The full test suite require a valid token for dev inside tests/test-data
pytest --cov=gpas

Binary distribution

The functionality of gpas upload is also distributed as a binary packaged with PyInstaller. This is a portable, standalone executable. These binaries can be downloaded from the 'Artifacts' section of each workflow run listed here: https://github.com/GlobalPathogenAnalysisService/gpas-cli/actions/workflows/distribute.yml

Usage

cli-upload --environment dev --token token.json large-nanopore-bam.csv --json-messages --processes 1

If you encounter exceptions related to running samtools and readItAndKeep, set the environment variables GPAS_READITANDKEEP_PATH and GPAS_SAMTOOLS_PATH to the respective binary paths. Note that unlike the Python distribution, the PyInstaller binary currently only supports serial decontamination and bam conversion (--processes 1).

Creation

conda env create -f environment-dev.yml
conda activate gpas-cli-dev
pyinstaller --onefile --name cli-upload --add-data src/gpas/data:data --noconfirm src/gpas/cli-upload.py

Authors: Bede Constantinides and Philip Fowler

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gpas-0.4.2.tar.gz (70.2 kB view details)

Uploaded Source

Built Distribution

gpas-0.4.2-py3-none-any.whl (68.3 kB view details)

Uploaded Python 3

File details

Details for the file gpas-0.4.2.tar.gz.

File metadata

  • Download URL: gpas-0.4.2.tar.gz
  • Upload date:
  • Size: 70.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/0.0.0 pkginfo/1.8.2 readme-renderer/27.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.2 keyring/23.4.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.10.2

File hashes

Hashes for gpas-0.4.2.tar.gz
Algorithm Hash digest
SHA256 5f17589b99abdaf1be423eb1459775ce3e6e516a4f04af243135338956a0067d
MD5 39a937bd2fe562f2ed2c3b3bec6f1fd3
BLAKE2b-256 e6d3c42b8d5595d6f8aee8c48ea26df27ea4b8f7ec76498b36a4c6c782959f1d

See more details on using hashes here.

File details

Details for the file gpas-0.4.2-py3-none-any.whl.

File metadata

  • Download URL: gpas-0.4.2-py3-none-any.whl
  • Upload date:
  • Size: 68.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/0.0.0 pkginfo/1.8.2 readme-renderer/27.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.2 keyring/23.4.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.10.2

File hashes

Hashes for gpas-0.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c07f4a27730639bbd43a3523c669a44e5bca5eaefc38682e63204747500f2380
MD5 e073164cbe72bb99fa19786a3ee9a86c
BLAKE2b-256 a817dc6fabe585049d0776f4e4afb958b64fb8ac38078e59a2e8735cc3a22e59

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page