Skip to main content

tools to support genome and metagenome analysis

Project description

genome-grist - map Illumina metagenomes to GenBank genomes

PyPI License: 3-Clause BSD

In brief

genome-grist is a toolkit to do the following:

  1. download a metagenome
  2. process it into trimmed reads, and make a sourmash signature
  3. search the sourmash signature with 'gather' against sourmash databases, e.g. all of genbank
  4. download the matching genomes from genbank
  5. map all metagenome reads to genomes using minimap
  6. extract matching reads iteratively based on gather, successively eliminating reads that matched to previous gather matches
  7. run mapping on “leftover” reads to genomes
  8. summarize all mapping results

Installation

The command:

python -m pip install genome-grist

will install the latest version. Plase use python3.7 or later. We suggest using an isolated conda environment.

Quick start:

Run the following three commands.

First, download SRA sample HSMA33MX, trim reads, and build a sourmash signature:

genome-grist process HSMA33MX smash_reads

Next, run sourmash signature against genbank:

genome-grist process HSMA33MX smash_reads

(NOTE, this depends on the latest genbank genomes and won't work for most people just yet; for now, use cached results from the repo:

cp tests/test-data/HSMA33MX.x.genbank.gather.csv outputs/genbank/
touch outputs/genbank/HSMA33MX.x.genbank.gather.out

)

Finally, download the reference genomes, map reads and produce a summary report:

genome-grist process HSMA33MX summarize -j 8

Full set of top-level process targets

  • download_reads
  • trim_reads
  • smash_reads
  • gather_genbank
  • download_matching_genomes
  • map_reads
  • summarize

Support

genome-grist is alpha-level software. Please be patient and kind :).

Please ask questions and add comments by filing github issues.

Why the name grist?

'grist' is in the sourmash family of names (sourmash, wort, distillerycats, etc.) See Grist.

(It is not the computing grist!)


CTB Nov 7, 2020

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genome-grist-0.3.tar.gz (4.4 MB view details)

Uploaded Source

File details

Details for the file genome-grist-0.3.tar.gz.

File metadata

  • Download URL: genome-grist-0.3.tar.gz
  • Upload date:
  • Size: 4.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.0 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.7.6

File hashes

Hashes for genome-grist-0.3.tar.gz
Algorithm Hash digest
SHA256 a7aea309644d66b324a4bc55b53c23b4a2469e3a88ecb24a59e3466c45879f3d
MD5 38a73742ccac5c5020a9b85f6f7a1b6c
BLAKE2b-256 31fe44f82f5572f7d92007efdf13731c3dfc277f33ae419f9f1bb9a6b7af25b8

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page