tools to support genome and metagenome analysis

These details have not been verified by PyPI

Project links

Homepage

Environment
- Console
- MacOS X
Intended Audience
- Science/Research
License
- OSI Approved :: BSD License
Natural Language
- English
Operating System
- MacOS :: MacOS X
- POSIX :: Linux
Programming Language
Topic
- Scientific/Engineering :: Bio-Informatics

Project description

genome-grist - reference-based exploration of Illumina metagenomes

In brief

genome-grist automates a number of tasks around genome-based metagenome interpretation.

One key point of genome-grist is this: we can take advantage of sourmash gather to find the smallest set of genomes to which to map metagenome reads. genome-grist automates all the stuff AROUND doing that!

So, genome-grist is a toolkit to do the following:

download a metagenome
process it into trimmed reads, and make a sourmash signature
search the sourmash signature with 'gather' against sourmash databases, e.g. all of genbank
download the matching genomes from genbank
map all metagenome reads to genomes using minimap
extract matching reads iteratively based on gather, successively eliminating reads that matched to previous gather matches
run mapping on “leftover” reads to genomes
summarize all mapping results

Installation

The command:

python -m pip install genome-grist

will install the latest version. Plase use python3.7 or later. We suggest using an isolated conda environment; the following commands should work for conda:

conda create -n grist python=3.7 pip
conda activate grist
python -m pip install genome-grist

Quick start:

Run the following three commands.

First, download SRA sample HSMA33MX, trim reads, and build a sourmash signature:

genome-grist process HSMA33MX smash_reads

Next, run sourmash signature against genbank:

genome-grist process HSMA33MX gather_genbank

(NOTE, this depends on the latest genbank genomes and won't work for most people just yet; for now, use cached results from the repo:

cp tests/test-data/HSMA33MX.x.genbank.gather.csv outputs/genbank/
touch outputs/genbank/HSMA33MX.x.genbank.gather.out

)

Finally, download the reference genomes, map reads and produce a summary report:

genome-grist process HSMA33MX summarize -j 8

(You can run all of the above with make test in the repo.)

The summary report will be in outputs/reports/report-HSMA33MX.html.

You can see some example reports for this and other data sets online:

Compute requirements

You'll need enough disk space to store about 5 copies of your raw metagenome.

The peak memory requirement is in the k-mer trimming and sourmash gather steps. You'll probably want between 30 and 60 GB of RAM for those, although for smaller or less diverse metagenomes, you will use a lot less.

Full set of top-level `process` targets

download_reads
trim_reads
smash_reads
gather_genbank
download_matching_genomes
map_reads
summarize

Support

genome-grist is alpha-level software. Please be patient and kind :).

Please ask questions and add comments by filing github issues.

Why the name `grist`?

'grist' is in the sourmash family of names (sourmash, wort, distillerycats, etc.) See Grist.

(It is not the computing grist!)

CTB Nov 8, 2020

Project details

These details have not been verified by PyPI

Project links

Homepage

Environment
- Console
- MacOS X
Intended Audience
- Science/Research
License
- OSI Approved :: BSD License
Natural Language
- English
Operating System
- MacOS :: MacOS X
- POSIX :: Linux
Programming Language
Topic
- Scientific/Engineering :: Bio-Informatics

Release history Release notifications | RSS feed

0.9.3

Dec 7, 2022

0.9.2

Dec 6, 2022

0.9.1

Dec 4, 2022

0.9.0

Sep 30, 2022

0.8.4

Jul 3, 2022

0.8.3

Feb 16, 2022

0.8.2

Feb 12, 2022

0.8.1

Jan 30, 2022

0.8.0

Jan 17, 2022

0.7.4

Dec 19, 2021

0.7.3

Nov 3, 2021

0.7.2

May 24, 2021

0.7.1

May 19, 2021

0.7

Feb 15, 2021

0.6.1

Jan 27, 2021

0.6

Jan 27, 2021

This version

0.5

Nov 21, 2020

0.4

Nov 16, 2020

0.3.2

Nov 8, 2020

0.3.1

Nov 7, 2020

0.3

Nov 7, 2020

0.2.2

Nov 6, 2020

0.1.1

Oct 27, 2020

0.1

Oct 27, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genome-grist-0.5.tar.gz (8.5 MB view details)

Uploaded Nov 21, 2020 Source

File details

Details for the file genome-grist-0.5.tar.gz.

File metadata

Download URL: genome-grist-0.5.tar.gz
Upload date: Nov 21, 2020
Size: 8.5 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.6.0 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.7.6

File hashes

Hashes for genome-grist-0.5.tar.gz
Algorithm	Hash digest
SHA256	`ea2e2ba45f6e7e0cc649467847ef50d2c5ca77fc12cd2456a7f347b3f0c8f8ce`
MD5	`45bc72de42c584f579376280de7db489`
BLAKE2b-256	`0bed16d8f94c044aaf1566e8088db92f5db962f032bbab4b54078bcf4083b80b`

See more details on using hashes here.

genome-grist 0.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

genome-grist - reference-based exploration of Illumina metagenomes

In brief

Installation

Quick start:

Compute requirements

Full set of top-level `process` targets

Support

Why the name `grist`?

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes

Provenance

genome-grist 0.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

genome-grist - reference-based exploration of Illumina metagenomes

In brief

Installation

Quick start:

Compute requirements

Full set of top-level process targets

Support

Why the name grist?

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes

Provenance

Full set of top-level `process` targets

Why the name `grist`?