sourmash plugin to sketch many sequence files
Project description
sourmash_plugin_sketchall
Sketch many files at once with sourmash, using threads.
The sketchall
plugin is a convenient way to:
- automatically discover & sketch many sequence files in a directory hierarchy.
- speed up sketching using multiple threads.
Installation
pip install sourmash_plugin_sketchall
This will use 8 processes to (attempt to) sketch all of the files
underneath directory
. Filenames ending in .sig
or .sig.gz
will
be ignored.
Usage
The following command:
sourmash scripts sketchall examples -j 8
will use 8 threads to (attempt to) sketch all of the files
underneath examples
. Filenames ending in .sig
, .sig.gz
,
.zip
, and .sqldb
will
be ignored, and failed files will be reported (but failures will be
ignored).
By default, sketchall
will save signatures in place: sketches for
examples/10.fa.gz
are saved to examples/10.fa.gz.zip
, and sketches
for examples/subdir/2.fa.gz
are saved to
examples/subdir/2.fa.gz.zip
.
With -o/--output-directory
, sketchall
will sketch into a new hierarchy
of files; so, for example,
sourmash scripts sketchall examples -o sigs/
will save the sketch for examples/subdir/2.fa.gz
to sigs/subdir/2.fa.gz
.
The default signature format for sketchall
is .zip
. This can be changed
by using --extension
:
sourmash scripts sketchall examples -o sigs/ --extension .sig.gz
will create sigs/10.fa.gz.sig.gz
and sigs/subdir/2.fa.gz.sig.gz
.
The pattern for files to sketch can be set by using --pattern
:
sourmash scripts sketchall examples --pattern "2.*.gz"
The parameter string used to sketch files can be changed with -p/--param-string
:
sourmash scripts sketchall examples -p k=21 examples/ -o output.k21
After sketchall
, sourmash sig cat
can be used to collect all of the
sketches into a single zip file, e.g.
sourmash sig cat sigs -o all-sigs.zip
Benchmarks and speedups
On a small collection of 64 genomes, using 4-8 threads more than doubles the speed of sketching - for larger files, speedups should approach linear scaling.
Wall time (s) | Threads | Efficiency |
---|---|---|
6.00 | 1 | 106% |
2.84 | 4 | 306% |
2.41 | 8 | 321% |
Support
We suggest filing issues in the main sourmash issue tracker as that receives more attention!
Dev docs
sketchall
is developed at https://github.com/sourmash-bio/sourmash_plugin_sketchall.
Testing
Run:
pytest tests
Generating a release
Bump version number in pyproject.toml
and push.
Make a new release on github.
Then pull, and:
python -m build
followed by twine upload dist/...
.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for sourmash_plugin_sketchall-0.2.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 41c603a1dfe2da284d2d3ea2a0aa8b248e8b3953db3a420275dd6a0ab253d72f |
|
MD5 | dcbf47e711729de1ba071481aef1e57c |
|
BLAKE2b-256 | 4583dc7cf7803c6a413033dbe41602b7574f1bf19e190df9189c787033622a81 |
Hashes for sourmash_plugin_sketchall-0.2.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6c2fac817114d23c702d40a59b28333676dc1aec4c434912cdd6d0aabe5028b2 |
|
MD5 | 7c204580c02eee37fc0595d5423631cc |
|
BLAKE2b-256 | a46c56c26845d31955605959da858317fc88b60f639d0a930a55020259b385ec |