Skip to main content

Python code to integrate results of tb-pipeline and provide an antibiogram, mutations and variants

Project description

Tests

gnomon

Python code to integrate results of tb-pipeline and provide an antibiogram, mutations and variations

Provides a library of functions for use within scripts, as well as a CLI tool for linking the functions together to produce output

Usage

usage: gnomon [-h] --vcf_file VCF_FILE --genome_object GENOME_OBJECT [--catalogue_file CATALOGUE_FILE]
              [--ignore_vcf_filter] [--progress] [--output_dir OUTPUT_DIR] [--json] [--alt_json] [--fasta FASTA]

options:
  -h, --help            show this help message and exit
  --vcf_file VCF_FILE   the path to a single VCF file
  --genome_object GENOME_OBJECT
                        the path to a compressed gumpy Genome object or a genbank file
  --catalogue_file CATALOGUE_FILE
                        the path to the resistance catalogue
  --ignore_vcf_filter   whether to ignore the FILTER field in the vcf (e.g. necessary for some versions of
                        Clockwork VCFs)
  --progress            whether to show progress using tqdm
  --output_dir OUTPUT_DIR
                        Directory to save output files to. Defaults to wherever the script is run from.
  --json                Flag to create a single JSON output as well as the CSVs
  --alt_json            Whether to produce the alternate JSON format. Requires the --json flag too
  --fasta FASTA         Use to output a FASTA file of the resultant genome. Specify either 'fixed' or 'variable'
                        for fixed length and variable length FASTA respectively.

Helper usage

As the main script can utilise pickled gumpy.Genome objects, there is a supplied helper script. This converts a Genbank file into a pickled gumpy.Genome for significant time saving. Due to the security implications of the pickle module, DO NOT SEND/RECEIVE PICKLES. This script should be used on a host VM before running nextflow to avoid reinstanciation. Supports gzip compression to reduce file size significantly (using the --compress flag).

usage: gbkToPkl FILENAME [--compress]

Install

Currently there may be some issues with versions of gumpy/piezo on pypi, so these may need to be installed from git beforehand.

git clone git@github.com:GlobalPathogenAnalysisService/gnomon.git
cd gnomon
pip install .

TODO: PyPi

Docker

A Docker image should be built on releases. To open a shell with Gnomon installed:

docker run -it oxfordmmm/gnomon:latest

User stories

  1. As a bioinformatician, I want to be able to run gnomon on the command line, passing it (i) a GenBank file (or pickled gumpy.Genome object), (ii) a resistance catalogue and (iii) a VCF file, and get back pandas.DataFrames of the genetic variants, mutations, effects and predictions/antibiogram. The latter is for all the drugs described in the passed resistance catalogue.

  2. As a GPAS developer, I want to be able to embed gnomon in a Docker image/NextFlow pipeline that consumes the outputs of tb-pipeline and emits a structured, well-designed JSON object describing the genetic variants, mutations, effects and predictions/antibiogram.

  3. In general, I would also like the option to output fixed- and variable-length FASTA files (the latter takes into account insertions and deletions described in any input VCF file).

Unit testing

For speed, rather than use NC_000962.3 (i.e. H37Rv M. tuberculosis), we shall use SARS-CoV-2 and have created a fictious drug resistance catalogue, along with some vcf files and the expected outputs in tests/.

These can be run with pytest -vv

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gnomonicus-1.0.tar.gz (17.2 kB view details)

Uploaded Source

Built Distributions

gnomonicus-1.0.0-py3-none-any.whl (17.2 kB view details)

Uploaded Python 3

gnomonicus-1.0-py3-none-any.whl (17.1 kB view details)

Uploaded Python 3

File details

Details for the file gnomonicus-1.0.tar.gz.

File metadata

  • Download URL: gnomonicus-1.0.tar.gz
  • Upload date:
  • Size: 17.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for gnomonicus-1.0.tar.gz
Algorithm Hash digest
SHA256 4048b398fb3e637d5cf77e8d6ea30bee6967a52f7198a3d2150dfa3e1dc5d38e
MD5 63941944bdce52f14f415c0f9288e18a
BLAKE2b-256 b31b6e54f71affffb0d88fa7657ecc348f47ad2fec37e9ca414523e61d704ee6

See more details on using hashes here.

File details

Details for the file gnomonicus-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: gnomonicus-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 17.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.14

File hashes

Hashes for gnomonicus-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1839a6c567a4c514de114381f0fd7397ebbc373c9464c6b316a246fab0913766
MD5 bf421bc48caf8c7e2c589b6eba8c879b
BLAKE2b-256 022aa37007ddd0357a06133b804d41553fec6e2199a52cd1defae3c0eb4b08ee

See more details on using hashes here.

File details

Details for the file gnomonicus-1.0-py3-none-any.whl.

File metadata

  • Download URL: gnomonicus-1.0-py3-none-any.whl
  • Upload date:
  • Size: 17.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for gnomonicus-1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b8678463524ac5a87865324c0b9ce6c0aa7b852b9b5f6263ff7fc30f4c908f39
MD5 78ee7a77c15f482ba869f2578da877ac
BLAKE2b-256 acd8fe08946066f73f81519d4743682c7b1ad8677b4afb5c3d5832e4cb814eea

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page