fast search and gather extensions for sourmash
Project description
pyo3_branchwater
tl;dr Do fast and low-memory search/gather of many sourmash sketches via a sourmash plugin.
Details
This repo contains a PyO3-based Python wrapper around the core branchwater code. Branchwater is a fast, low-memory and multithreaded application for searching very large collections of FracMinHash sketches as generated by sourmash.
For details, see the Rust code in src/
and Python wrapper in python/
.
Uses pyo3 for the Python-to-Rust wrapping.
This functionality can be used from within sourmash as a command-line plugin; see below quickstart.
Documentation
There is a quickstart below, as well as more documentation here.
Quickstart for manysearch
.
To try out, you'll need to install a branch of sourmash that contains sourmash#2438.
This quickstart demonstrates manysearch
using
the 64 genomes from Awad et al., 2017.
First, install this code.
Install this repo in developer mode:
pip install -e .
Second, download sketches.
The following commands will download sourmash sketches for them and
unpack them into the directory podar-ref/
:
mkdir -p podar-ref
curl -JLO https://osf.io/4t6cq/download
unzip -u podar-reference-genomes-updated-sigs-2017.06.10.zip
Third, create lists of query and subject files.
manysearch
takes in lists of signatures to search, so we need to
create those files:
ls -1 podar-ref/{2,47,63}.* > query-list.txt
ls -1 podar-ref/* > podar-ref-list.txt
Fourth: Execute!
Now run manysearch
:
sourmash scripts manysearch query-list.txt podar-ref-list.txt -o results.csv
You will (hopefully ;) see a set of results in results.csv
.
Debugging help
If your file lists are not working properly, try running:
sourmash sig summarize query-list.txt
sourmash sig summarize podar-ref-list.txt
to make sure everything can be loaded.
Future thoughts
The speed and functions of this code will probably be brought into sourmash core in the future, most likely as part of sourmash#2230. However, in the meantime, this is a fun side project that makes use of sourmash plugins and Rust to provide some fast functionality that may be of use to some people, and it can serve as a testbed for future sourmash functionality.
Generating a release
Bump version number in Cargo.toml
and push.
Make a new release on github.
Then pull, and:
python -m build
followed by twine upload dist/*.tar.gz
.
CTB Aug 2023
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pyo3_branchwater-0.4.1-cp38-cp38-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 38c59cd0561a5d5f9f00ae40ae2d9cadc0b0211579a83a1f1dd001d9625e2d97 |
|
MD5 | 8a6384b5182ca7a60ce14ecc0c85fea9 |
|
BLAKE2b-256 | 4b43c28dca67fa24598c78e9aab6e28c746b891c1c11bc1b184550a760dd3beb |