Skip to main content

sourmash plugin for improved plotting/viz and cluster examination.

Project description

sourmash_plugin_betterplot

sourmash is a tool for biological sequence analysis and comparisons.

betterplot is a sourmash plugin that provides improved plotting/viz and cluster examination for sourmash-based sketch comparisons.

Why does this plugin exist?

sourmash compare and sourmash plot produce basic distance matrix plots that are useful for comparing and visualizing the relationships between dozens to hundreds of genomes. And this is one of the most popular use cases for sourmash!

However, the visualization can be improved a lot beyond the basic viz that sourmash plot produces. There are a lot of only slightly more complicated use cases for comparing, clustering, and visualizing many genomes!

And this plugin exists to explore some of these use cases!

Specific goals:

  • provide a variety of plotting and exploration commands that can be used with sourmash tools;
  • provide both command-line functionality and functions that can be imported and used in Jupyter notebooks;
  • (maybe) explore other backends than matplotlib;

and who knows what else??

Installation

pip install sourmash_plugin_betterplot

Usage

See the examples below.

Examples

The command lines below are executable in the examples/ subdirectory of the repository after installing the plugin.

Basic 3 sketches example: plot2

Compare 3 sketches, and cluster.

This command:

sourmash compare sketches/{2,47,63}.sig.zip -o 3sketches.cmp
    --labels-to 3sketches.cmp.labels_to.csv

sourmash scripts plot2 3sketches.cmp 3sketches.cmp.labels_to.csv \
    -o examples/plot2.3sketches.cmp.png

produces this plot:

basic 3-sketches example

3 sketches example with a cut line: plot2 --cut-point 1.2

Compare 3 sketches, cluster, and show a cut point.

This command:

sourmash compare sketches/{2,47,63}.sig.zip -o 3sketches.cmp
    --labels-to 3sketches.cmp.labels_to.csv

sourmash scripts plot2 3sketches.cmp 3sketches.cmp.labels_to_csv \
    -o examples/plot2.cut.3sketches.cmp.png \
    --cut-point=1.2

produces this plot:

3-sketches example w/cut line

Dendrogram of 10 sketches with a cut line + cluster extraction

Compare 10 sketches, cluster, and use a cut point to extract multiple clusters. Use --dendrogram-only to plot just the dendrogram.

This command:

sourmash compare sketches/{2,47,48,49,51,52,53,59,60}.sig.zip \
    -o 10sketches.cmp \
    --labels-to 10sketches.cmp.labels_to.csv

sourmash scripts plot2 10sketches.cmp 10sketches.cmp.labels_to.csv \
    -o plot2.cut.dendro.10sketches.cmp.png \
    --cut-point=1.35 --cluster-out --dendrogram-only

produces this plot:

10-sketches example w/cut line

as well as a set of 6 clusters to 10sketches.cmp.*.csv.

Multidimensional Scaling (MDS) plot of 10-sketch comparison

Use MDS to display a comparison.

This command:

sourmash compare sketches/{2,47,48,49,51,52,53,59,60}.sig.zip \
    -o 10sketches.cmp \
    --labels-to 10sketches.cmp.labels_to.csv

sourmash scripts mds 10sketches.cmp 10sketches.cmp.labels_to.csv \
    -o mds.10sketches.cmp.png \
    -C 10sketches-categories.csv

produces this plot: 10-sketches plotted using MDS

Multidimensional Scaling (MDS) plot of 10-sketch comparisons from pairwise output

Use MDS to display a sparse comparison created using the branchwater plugin's pairwise command. The output of pairwise is distinct from the sourmash compare output: pairwise produces a sparse CSV file that contains just the matches above threshold, while sourmash compare produces a dense numpy matrix.

These commands:

sourmash sig cat sketches/{2,47,48,49,51,52,53,59,60}.sig.zip \
    -o 10sketches.sig.zip
sourmash scripts pairwise 10sketches.sig.zip -o 10sketches.pairwise.csv

sourmash scripts mds 10sketches.cmp \
    -o mds.10sketches.cmp.png \
    -C 10sketches-categories.csv

produces this plot: 10-sketches plotted using MDS2

Convert pairwise output to sourmash compare output and plot

These commands:

# build pairwise
sourmash sig cat sketches/{2,47,48,49,51,52,53,59,60}.sig.zip \
    -o 10sketches.sig.zip
sourmash scripts pairwise 10sketches.sig.zip -o 10sketches.pairwise.csv

# convert pairwise
sourmash scripts pairwise_to_compare 10sketches.pairwise.csv \
    -o 10sketches.pairwise.cmp --write-all \
    --labels-to 10sketches.pairwise.cmp.labels_to.csv
    
# plot!
sourmash scripts plot2 10sketches.pairwise.cmp \
    10sketches.pairwise.cmp.labels_to.csv \
    -o plot2.pairwise.10sketches.cmp.png

produce this plot:

10-sketches plotted from pairwise

Support

We suggest filing issues in the main sourmash issue tracker as that receives more attention!

Dev docs

betterplot is developed at https://github.com/sourmash-bio/sourmash_plugin_betterplot.

See environment.yml for the dependencies needed to develop betterplot.

Testing

Run:

make examples

to run the examples.

For now, the examples serve as the tests; eventually we will add unit tests.

Generating a release

Bump version number in pyproject.toml and push.

Make a new release on github.

Then pull, and:

python -m build

followed by twine upload dist/....


CTB May 2024

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sourmash_plugin_betterplot-0.3.tar.gz (8.8 kB view details)

Uploaded Source

Built Distribution

sourmash_plugin_betterplot-0.3-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file sourmash_plugin_betterplot-0.3.tar.gz.

File metadata

File hashes

Hashes for sourmash_plugin_betterplot-0.3.tar.gz
Algorithm Hash digest
SHA256 521078676ce1573fa3cad2ab5cb84bfd88938fdebc471552bfdcf600bb7ab0fa
MD5 80d36525d657e453472c6db19424fe9b
BLAKE2b-256 f4cd1fec60273b2ed1ccecc5010e92f8159f01aa80ef3d4b604fc135ac9d11e9

See more details on using hashes here.

Provenance

File details

Details for the file sourmash_plugin_betterplot-0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for sourmash_plugin_betterplot-0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 a3a15b8aaf22118a745ccd3e59b1edba29a5177b355b755f658d4ab6e62c6b14
MD5 2be9bae8e505030519ef9e97cd980496
BLAKE2b-256 227a52752c421df7e0274bcc1f9a08ce6c4efa2c8ef804123bcec1da4deaf865

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page