Skip to main content

fast search and gather extensions for sourmash

Project description

pyo3_branchwater

PyPI

tl;dr Do fast and low-memory search/gather of many sourmash sketches via a sourmash plugin.

Details

This repo contains a PyO3-based Python wrapper around the core branchwater code. Branchwater is a fast, low-memory and multithreaded application for searching very large collections of FracMinHash sketches as generated by sourmash.

For details, see the Rust code in src/ and Python wrapper in src/python/.

Uses pyo3 for the Python-to-Rust wrapping.

This functionality can be used from within sourmash as a command-line plugin; see below quickstart.

Documentation

There is a quickstart below, as well as more documentation here.

Quickstart for manysearch.

To try out branchwater, you'll need to install sourmash 4.8.3 or later.

This quickstart demonstrates multisearch using the 64 genomes from Awad et al., 2017.

1. Install necessary dependencies

You'll need rust, Python, and maturin to build, and sourmash to run. See environment.yml for a list of conda packages, and developer docs below for example command lines.

2. Install pyo3_branchwater.

Install this repo in developer mode:

pip install -e .

3. Download sketches.

The following commands will download sourmash sketches for the podar genomes into the file podar-ref.zip:

curl -L https://osf.io/4t6cq/download -o podar-ref.zip

5. Execute!

Now run multisearch to search all the sketches against each other:

sourmash scripts multisearch podar-ref.zip podar-ref.zip -o results.csv --cores 4

You will (hopefully ;)) see a set of results in results.csv. These are comparisons of each query against all matching genomes.

Debugging help

If your collections aren't loading properly, try running sourmash sig summarize on them, like so:

sourmash sig summarize podar-ref.zip

This will make sure everything can be loaded properly.

Future thoughts

The speed and functions of this code will probably be brought into sourmash core in the future, most likely as part of sourmash#2230. However, in the meantime, this is a fun side project that makes use of sourmash plugins and Rust to provide some fast functionality that may be of use to some people, and it can serve as a testbed for future sourmash functionality.

Developer notes

Installing a development environment

You'll need sourmash, rust, and maturin.

A simple way to get up and running is to run:

mamba env create -n branchwater-dev -f environment.yml

in the top directory of the repo, and then activate the environment and install in editable mode:

mamba activate branchwater-dev
pip install -e .

Running the tests locally

Executing:

make test

will run the Python tests.

Generating a release

  1. Bump version number in Cargo.toml and run make to update Cargo.lock. Then commit and push to origin/main.

  2. Make a new release on github with a matching version tag.

  3. Then pull, and:

make sdist
make upload_sdist

to create a new release on PyPI.

Building wheels

You can build a release wheel for your current platform with:

make wheel

and it will be placed under target/wheels/.


License

This software is under the AGPL license. Please see LICENSE.txt.

Authors

  • Luiz Irber
  • C. Titus Brown
  • Mohamed Abuelanin
  • N. Tessa Pierce-Ward

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyo3_branchwater-0.8.0.tar.gz (21.9 MB view details)

Uploaded Source

File details

Details for the file pyo3_branchwater-0.8.0.tar.gz.

File metadata

  • Download URL: pyo3_branchwater-0.8.0.tar.gz
  • Upload date:
  • Size: 21.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.13

File hashes

Hashes for pyo3_branchwater-0.8.0.tar.gz
Algorithm Hash digest
SHA256 4922915b76a9b18046c0ea1f73cd759e343beb00ddf23749c18c6f4ced780d23
MD5 f658b1de1c66ead700d249e97a73e6ae
BLAKE2b-256 0f49f2a21303087510605846da65c4e239ddbc0f9466447ae0d875ad605fa7da

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page