Skip to main content

sourmash plugin for improved plotting/viz and cluster examination.

Project description

sourmash_plugin_betterplot

sourmash is a tool for biological sequence analysis and comparisons.

betterplot is a sourmash plugin that provides improved plotting/viz and cluster examination for sourmash-based sketch comparisons.

Why does this plugin exist?

sourmash compare and sourmash plot produce basic distance matrix plots that are useful for comparing and visualizing the relationships between dozens to hundreds of genomes. And this is one of the most popular use cases for sourmash!

However, the visualization can be improved a lot beyond the basic viz that sourmash plot produces. There are a lot of only slightly more complicated use cases for comparing, clustering, and visualizing many genomes!

And this plugin exists to explore some of these use cases!

Specific goals:

  • provide a variety of plotting and exploration commands that can be used with sourmash tools;
  • provide both command-line functionality and functions that can be imported and used in Jupyter notebooks;
  • (maybe) explore other backends than matplotlib;

and who knows what else??

Installation

pip install sourmash_plugin_betterplot

Usage

See the examples below.

Examples

The command lines below are executable in the examples/ subdirectory of the repository after installing the plugin.

plot2 - basic 3 sketches example

Compare 3 sketches, and cluster.

This command:

sourmash compare sketches/{2,47,63}.sig.zip -o 3sketches.cmp
    --labels-to 3sketches.cmp.labels_to.csv

sourmash scripts plot2 3sketches.cmp 3sketches.cmp.labels_to.csv \
    -o examples/plot2.3sketches.cmp.png

produces this plot:

basic 3-sketches example

plot2 - 3 sketches example with a cut line: plot2 --cut-point 1.2

Compare 3 sketches, cluster, and show a cut point.

This command:

sourmash compare sketches/{2,47,63}.sig.zip -o 3sketches.cmp
    --labels-to 3sketches.cmp.labels_to.csv

sourmash scripts plot2 3sketches.cmp 3sketches.cmp.labels_to_csv \
    -o examples/plot2.cut.3sketches.cmp.png \
    --cut-point=1.2

produces this plot:

3-sketches example w/cut line

plot2 - dendrogram of 10 sketches with a cut line + cluster extraction

Compare 10 sketches, cluster, and use a cut point to extract multiple clusters. Use --dendrogram-only to plot just the dendrogram.

This command:

sourmash compare sketches/{2,47,48,49,51,52,53,59,60}.sig.zip \
    -o 10sketches.cmp \
    --labels-to 10sketches.cmp.labels_to.csv

sourmash scripts plot2 10sketches.cmp 10sketches.cmp.labels_to.csv \
    -o plot2.cut.dendro.10sketches.cmp.png \
    --cut-point=1.35 --cluster-out --dendrogram-only

produces this plot:

10-sketches example w/cut line

as well as a set of 6 clusters to 10sketches.cmp.*.csv.

mds- multidimensional Scaling (MDS) plot of 10-sketch comparison

Use MDS to display a comparison.

These commands:

sourmash compare sketches/{2,47,48,49,51,52,53,59,60}.sig.zip \
    -o 10sketches.cmp \
    --labels-to 10sketches.cmp.labels_to.csv

sourmash scripts mds 10sketches.cmp 10sketches.cmp.labels_to.csv \
    -o mds.10sketches.cmp.png \
    -C 10sketches-categories.csv

produces this plot: 10-sketches plotted using MDS

mds2 - multidimensional Scaling (MDS) plot of 10-sketch comparisons from pairwise output

Use MDS to display a sparse comparison created using the branchwater plugin's pairwise command. The output of pairwise is distinct from the sourmash compare output: pairwise produces a sparse CSV file that contains just the matches above threshold, while sourmash compare produces a dense numpy matrix.

These commands:

sourmash sig cat sketches/{2,47,48,49,51,52,53,59,60}.sig.zip \
    -o 10sketches.sig.zip
sourmash scripts pairwise 10sketches.sig.zip -o 10sketches.pairwise.csv

sourmash scripts mds 10sketches.cmp \
    -o mds.10sketches.cmp.png \
    -C 10sketches-categories.csv

produces this plot: 10-sketches plotted using MDS2

pairwise_to_compare - convert pairwise output to sourmash compare output and plot

These commands:

# build pairwise
sourmash sig cat sketches/{2,47,48,49,51,52,53,59,60}.sig.zip \
    -o 10sketches.sig.zip
sourmash scripts pairwise 10sketches.sig.zip -o 10sketches.pairwise.csv

# convert pairwise
sourmash scripts pairwise_to_compare 10sketches.pairwise.csv \
    -o 10sketches.pairwise.cmp --write-all \
    --labels-to 10sketches.pairwise.cmp.labels_to.csv
    
# plot!
sourmash scripts plot2 10sketches.pairwise.cmp \
    10sketches.pairwise.cmp.labels_to.csv \
    -o plot2.pairwise.10sketches.cmp.png

produce this plot:

10-sketches plotted from pairwise

plot3 - seaborn clustermap with color categories

The seaborn clustermap offers some nice visualization options.

These commands:

sourmash compare sketches/{2,47,48,49,51,52,53,59,60}.sig.zip \
    -o 10sketches.cmp \
    --labels-to 10sketches.cmp.labels_to.csv

sourmash scripts plot3 10sketches.cmp 10sketches.cmp.labels_to.csv \
    -o plot3.10sketches.cmp.png -C 10sketches-categories.csv

produce this plot:

plot3 10 sketches

Support

We suggest filing issues in the main sourmash issue tracker as that receives more attention!

Dev docs

betterplot is developed at https://github.com/sourmash-bio/sourmash_plugin_betterplot.

See environment.yml for the dependencies needed to develop betterplot.

Testing

Run:

make examples

to run the examples.

For now, the examples serve as the tests; eventually we will add unit tests.

Generating a release

Bump version number in pyproject.toml and push.

Make a new release on github.

Then pull, and:

python -m build

followed by twine upload dist/....


CTB May 2024

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sourmash_plugin_betterplot-0.3.1.tar.gz (9.3 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file sourmash_plugin_betterplot-0.3.1.tar.gz.

File metadata

File hashes

Hashes for sourmash_plugin_betterplot-0.3.1.tar.gz
Algorithm Hash digest
SHA256 f891ff5697a93d1201cf5b6df1b20082dada5c9e9242a25547287fd8b13a2be9
MD5 35f9d2efab34b9e57e655b16325d592d
BLAKE2b-256 e783392335eefecbce0d08b8323f5ffae46b9f48026fb85e5225694c33934b72

See more details on using hashes here.

Provenance

File details

Details for the file sourmash_plugin_betterplot-0.3.1-py3-none-any.whl.

File metadata

File hashes

Hashes for sourmash_plugin_betterplot-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 da6bb33e4c3d76f1016f82134fda7317b0173ce93decb537cb67eabedf277618
MD5 3589145f996b930a059dff43f5ae05d4
BLAKE2b-256 43a8c876b4e140dc66fa4b9a782dfd15291726d1791a94807531d77843967e44

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page