sourmash plugin to calculate common hashes across multiple sketches.
Project description
sourmash_plugin_commonhash
If you have sketched many samples and you want to remove "rare" k-mers (present in 1, or only a few samples), this plugin is for you! This procedure helps reduce noise in Jaccard comparisons between samples.
See sourmash#2383 for an extended discussion!
Thanks to Taylor Reiter and Jessica Lumian for all their work on this!
Installation
pip install sourmash_plugin_commonhash
Usage
sourmash scripts commonhash <multiple sketches> -o commonhashes.zip
commonhash will output one filtered sketch for each input sketch.
You can then use the various sourmash sig
commands to union these
sketches, extract individual ones, etc.
Example
sourmash scripts commonhash examples/*.sig.gz -o commonhash.zip
should yield:
...
Selecting k=31, DNA
Loaded 10587 hashes from 3 sketches in 3 files.
Of 10587 hashes, keeping 2529 that are in 2 or more samples.
Saved 3 signatures to 'commonhash.zip'
Support
We suggest filing issues in the main sourmash issue tracker as that receives more attention!
Dev docs
commonhash
is developed at https://github.com/ctb/sourmash_plugin_commonhash.
Generating a release
Bump version number in pyproject.toml
and push.
Make a new release on github.
Then pull, and:
python -m build
followed by twine upload dist/...
.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file sourmash_plugin_commonhash-0.4.tar.gz
.
File metadata
- Download URL: sourmash_plugin_commonhash-0.4.tar.gz
- Upload date:
- Size: 4.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.8.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5ee0f62692fb41824bcb3141fafd381dd172e94ef60ee2bb0214f6f16b6e7d2e |
|
MD5 | 8edec3aa791e8ebf02ed8319c2b685de |
|
BLAKE2b-256 | 2c89d46d65f6e4fdb68f81dea08dbd092c097765031f367bb1a86bdad3b497b9 |
File details
Details for the file sourmash_plugin_commonhash-0.4-py3-none-any.whl
.
File metadata
- Download URL: sourmash_plugin_commonhash-0.4-py3-none-any.whl
- Upload date:
- Size: 5.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.8.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 96d0d983a3c3719961bb5b15f9f0d33eb6ad12c4ea3da34b1e949c05d1d58a0a |
|
MD5 | 952ad9687f7a598defeb954aad7e9479 |
|
BLAKE2b-256 | 3bcd6f5df111fddd12eba899fd18acc940e951e26978fd510525286f0d94d07b |