Skip to main content

tools for biological assembly graph neighborhood analysis

Project description

spacegraphcats

Test codecov DOI PyPI

Explore large, annoying graphs using hierarchies of dominating sets - because in space, no one can hear you miao!

This is a collaboration between the Theory In Practice lab at University of Utah, the Lab for Data Intensive Biology at UC Davis, and Dr. Felix Reidl at Birkbeck University of London. Initial development of spacegraphcats was generously supported by the Moore Foundation's Data Driven Discovery Initiative.

spacegraphcats graph

Documentation

This README file contains quickstart information. For use cases and other information, please see the spacegraphcats documentation at https://spacegraphcats.github.io/spacegraphcats.

Installation and execution quickstart

See installation instructions and the run guide.

For help or support with this software, please file an issue on GitHub. Thank you!

Quickstart

There are two quickstart examples available! Please see dory-example and twofoo-example. The latter example includes a snakemake Snakefile.

Notable dependencies

spacegraphcats uses code from BBHash, a C++ library for building minimal perfect hash functions (Guillaume Rizk, Antoine Limasset, Rayan Chikhi; see Limasset et al., 2017, arXiv, as wrapped by pybbhash.

spacegraphcats also uses functionality from khmer and sourmash.

Citation information

See the Genome Biology publication Exploring neighborhoods in large metagenome assembly graphs using spacegraphcats reveals hidden sequence diversity, Brown et al., 2020, doi: https://doi.org/10.1186/s13059-020-02066-4.

Pointers to interesting code

Interesting algorithms

The rdomset code for efficently calculating a dominating set of a graph at a given radius R is in spacegraphcats/catlas/rdomset.py.

The graph denoising code for removing low-abundance pendants from BCALM cDBGs is in function contract_degree_two in cdbg/bcalm_to_gxt.py.

Part of the indexPieces code for indexing cDBG nodes by dominating nodes is cdbg/index_cdbg_by_kmer.py. The remainder is implemented in search, below.

The search code for extracting query neighborhoods is in search/query_by_sequence.py; see especially the call to kmer_idx.count_cdbg_matches(...).

Interesting library functionality

Code for indexing large FASTQ/FASTA read files by cDBG unitig, and extracting the reads corresponding to individual unitigs from BGZF files, is available in cdbg/label_cdbg.py and search/search_utils.py, get_reads_by_cdbg, respectively.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacegraphcats-2.1.2.tar.gz (11.0 MB view details)

Uploaded Source

File details

Details for the file spacegraphcats-2.1.2.tar.gz.

File metadata

  • Download URL: spacegraphcats-2.1.2.tar.gz
  • Upload date:
  • Size: 11.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.13

File hashes

Hashes for spacegraphcats-2.1.2.tar.gz
Algorithm Hash digest
SHA256 de5ecde0c39ea9e3e7316bc7c97a7795ab59b32436c018440843e097e6269359
MD5 1e7bb607d1137b8a5cd698d91a890cb9
BLAKE2b-256 599b08f3a7776f4e4e7dacace546578a10d36c4475b3cddc6796780fd38a4576

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page