Skip to main content

sourmash plugin for improved plotting/viz and cluster examination.

Project description

sourmash_plugin_betterplot

sourmash is a tool for biological sequence analysis and comparisons.

betterplot is a sourmash plugin that provides improved plotting/viz and cluster examination for sourmash-based sketch comparisons.

Why are we using the

sourmash compare and sourmash plot produce basic distance matrix plots that are useful for comparing and visualizing the relationships between dozens to hundreds of genomes. And this is one of the most popular use cases for sourmash!

But! The visualization can be improved a lot beyond the basic viz that sourmash plot produces, and there are a lot of only slightly more complicated use cases for comparing, clustering, and visualizing many genomes!

This plugin will explore some of these use cases!

Specific goals:

  • provide a variety of plotting and exploration commands that can be used with sourmash tools;
  • provide both command-line functionality and functions that can be imported and used in Jupyter notebooks;
  • (maybe) explore other backends than matplotlib;

and who knows what else??

Installation

pip install sourmash_plugin_betterplot

Usage

See the examples below.

Examples

The command lines below are executable in the examples/ subdirectory of the repository after installing the plugin.

Basic 3 sketches example: plot2

Compare 3 sketches, and cluster.

This command:

sourmash compare sketches/{2,47,63}.sig.zip -o 3sketches.cmp
    --labels-to 3sketches.cmp.labels_to.csv

sourmash scripts plot2 3sketches.cmp 3sketches.cmp.labels_to.csv \
    -o examples/plot2.3sketches.cmp.png

produces this plot:

basic 3-sketches example

3 sketches example with a cut line: plot2 --cut-point 1.2

Compare 3 sketches, cluster, and show a cut point.

This command:

sourmash compare sketches/{2,47,63}.sig.zip -o 3sketches.cmp
    --labels-to 3sketches.cmp.labels_to.csv

sourmash scripts plot2 3sketches.cmp 3sketches.cmp.labels_to_csv \
    -o examples/plot2.cut.3sketches.cmp.png \
    --cut-point=1.2

produces this plot:

3-sketches example w/cut line

Dendrogram of 10 sketches with a cut line + cluster extraction

Compare 10 sketches, cluster, and use a cut point to extract multiple clusters. Use --dendrogram-only to plot just the dendrogram.

This command:

sourmash compare sketches/{2,47,48,49,51,52,53,59,60}.sig.zip \
    -o 10sketches.cmp \
    --labels-to 10sketches.cmp.labels_to.csv

sourmash scripts plot2 10sketches.cmp 10sketches.cmp.labels_to.csv \
    -o plot2.cut.dendro.10sketches.cmp.png \
    --cut-point=1.35 --cluster-out --dendrogram-only

produces this plot:

10-sketches example w/cut line

as well as a set of 6 clusters to 10sketches.cmp.*.csv.

Multidimensional Scaling (MDS) plot of 10-sketch comparison

Use MDS to display a comparison.

This command:

sourmash compare sketches/{2,47,48,49,51,52,53,59,60}.sig.zip \
    -o 10sketches.cmp \
    --labels-to 10sketches.cmp.labels_to.csv

sourmash scripts mds 10sketches.cmp 10sketches.cmp.labels_to.csv \
    -o mds.10sketches.cmp.png \
    -C 10sketches-categories.csv

produces this plot: 10-sketches plotted using MDS

Support

We suggest filing issues in the main sourmash issue tracker as that receives more attention!

Dev docs

betterplot is developed at https://github.com/sourmash-bio/sourmash_plugin_betterplot.

See environment.yml for the dependencies needed to develop betterplot.

Testing

Run:

make examples

to run the examples.

For now, the examples serve as the tests; eventually we will add unit tests.

Generating a release

Bump version number in pyproject.toml and push.

Make a new release on github.

Then pull, and:

python -m build

followed by twine upload dist/....


CTB May 2024

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sourmash_plugin_betterplot-0.2.1.tar.gz (8.0 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file sourmash_plugin_betterplot-0.2.1.tar.gz.

File metadata

File hashes

Hashes for sourmash_plugin_betterplot-0.2.1.tar.gz
Algorithm Hash digest
SHA256 b55f03e403fc41ae9817e333aafaff5f683c62c17cb3cfe17042a9b8c3c009a4
MD5 db4563b3f4c138569c14c8fdb0721d82
BLAKE2b-256 ae268ba336cb5b64bed1d05278a60157cdfccd176715958b01f7be7709371d19

See more details on using hashes here.

Provenance

File details

Details for the file sourmash_plugin_betterplot-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for sourmash_plugin_betterplot-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bf7f2ddc1b50d3ed9e3366bc504b134cdcdd07846958501130fd749a74eb16b0
MD5 1a048fb9c7bc92362ec17c64c69cef20
BLAKE2b-256 2363c0f8453758c326c6910d1d6cd66d49e940a73dbc4261c72e9780c242fa8d

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page