Skip to main content

sourmash plugin for improved plotting/viz and cluster examination.

Project description

sourmash_plugin_betterplot

sourmash is a tool for biological sequence analysis and comparisons.

betterplot is a sourmash plugin that provides improved plotting/viz and cluster examination for sourmash-based sketch comparisons.

Why are we using the

sourmash compare and sourmash plot produce basic distance matrix plots that are useful for comparing and visualizing the relationships between dozens to hundreds of genomes. And this is one of the most popular use cases for sourmash!

But! The visualization can be improved a lot beyond the basic viz that sourmash plot produces, and there are a lot of only slightly more complicated use cases for comparing, clustering, and visualizing many genomes!

This plugin will explore some of these use cases!

Specific goals:

  • provide a variety of plotting and exploration commands that can be used with sourmash tools;
  • provide both command-line functionality and functions that can be imported and used in Jupyter notebooks;
  • (maybe) explore other backends than matplotlib;

and who knows what else??

Installation

pip install sourmash_plugin_betterplot

Usage

See the examples below.

Examples

The command lines below are executable in the examples/ subdirectory of the repository after installing the plugin.

Basic 3 sketches example: plot2

Compare 3 sketches, and cluster.

This command:

sourmash compare sketches/{2,47,63}.sig.zip -o 3sketches.cmp
    --labels-to 3sketches.cmp.labels_to.csv

sourmash scripts plot2 3sketches.cmp 3sketches.cmp.labels_to.csv \
    -o examples/plot2.3sketches.cmp.png

produces this plot:

basic 3-sketches example

3 sketches example with a cut line: plot2 --cut-point 1.2

Compare 3 sketches, cluster, and show a cut point.

This command:

sourmash compare sketches/{2,47,63}.sig.zip -o 3sketches.cmp
    --labels-to 3sketches.cmp.labels_to.csv

sourmash scripts plot2 3sketches.cmp 3sketches.cmp.labels_to_csv \
    -o examples/plot2.cut.3sketches.cmp.png \
    --cut-point=1.2

produces this plot:

3-sketches example w/cut line

Dendrogram of 10 sketches with a cut line + cluster extraction

Compare 10 sketches, cluster, and use a cut point to extract multiple clusters. Use --dendrogram-only to plot just the dendrogram.

This command:

sourmash compare sketches/{2,47,48,49,51,52,53,59,60}.sig.zip \
    -o 10sketches.cmp \
    --labels-to 10sketches.cmp.labels_to.csv

sourmash scripts plot2 10sketches.cmp 10sketches.cmp.labels_to.csv \
    -o plot2.cut.dendro.10sketches.cmp.png \
    --cut-point=1.35 --cluster-out --dendrogram-only

produces this plot:

10-sketches example w/cut line

as well as a set of 6 clusters to 10sketches.cmp.*.csv.

Multidimensional Scaling (MDS) plot of 10-sketch comparison

Use MDS to display a comparison.

This command:

sourmash compare sketches/{2,47,48,49,51,52,53,59,60}.sig.zip \
    -o 10sketches.cmp \
    --labels-to 10sketches.cmp.labels_to.csv

sourmash scripts mds 10sketches.cmp 10sketches.cmp.labels_to.csv \
    -o mds.10sketches.cmp.png \
    -C 10sketches-categories.csv

produces this plot: 10-sketches plotted using MDS

Support

We suggest filing issues in the main sourmash issue tracker as that receives more attention!

Dev docs

betterplot is developed at https://github.com/sourmash-bio/sourmash_plugin_betterplot.

See environment.yml for the dependencies needed to develop betterplot.

Testing

Run:

make examples

to run the examples.

For now, the examples serve as the tests; eventually we will add unit tests.

Generating a release

Bump version number in pyproject.toml and push.

Make a new release on github.

Then pull, and:

python -m build

followed by twine upload dist/....


CTB May 2024

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sourmash_plugin_betterplot-0.2.tar.gz (8.0 kB view details)

Uploaded Source

Built Distribution

sourmash_plugin_betterplot-0.2-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file sourmash_plugin_betterplot-0.2.tar.gz.

File metadata

File hashes

Hashes for sourmash_plugin_betterplot-0.2.tar.gz
Algorithm Hash digest
SHA256 67ca7e0ab8fbb55cceade7900fb2a558d2733378c0e7649bc80006f80720b597
MD5 87c6613175d7fc3b4ce17ef69b2e98eb
BLAKE2b-256 272485d90af8247ccf9c835d2bd9a17263b8c6201eb80a056a52803ef34c4c66

See more details on using hashes here.

Provenance

File details

Details for the file sourmash_plugin_betterplot-0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for sourmash_plugin_betterplot-0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f162b6c43382ea78ed0e1c86f4e190b83e21cf862b1cf29feaaa1e4933f9d810
MD5 c5dd90aac9abc82ec8635cb383c4b902
BLAKE2b-256 4befead4cf67144d83a7d01a8db56f9b23066deb76b81f54a29e83aeef620b91

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page