Skip to main content

Fast, scalable SNP distance calculation from disk.

Project description

Find Neighbour 5

Spiritual successor to Find Neighbour 4 - continuing the work of David Wyllie.

Tests Build FN5 images

SNP matrix generation with caching to disk to allow fast reloading.

Install

As this is compiled client side, ensure you have a functional C++ compiler such as GCC installed. Currently only tested on Linux

# Optional virtual environment
python -m virtualenv env
souce env/bin/activate

# Install via pip
pip install fn5

Notes

This provides some of the bindings to the underlying FN5 library, but has some limitations. Most of the functionality should be exposed via the Python bindings though.

Read FASTA files

Firstly, you'll need a reference genome, and a list of positions to mask within it. The mask is optional, but can be used to mask homoplastic or phylogenetic regions which aren't of epidemialogical interest. The mask should be line separated genome positions.

import fn5

reference = fn5.load_reference("<path to your reference>")
mask = fn5.load_mask("<path to your mask>")

# Alternatively ignore the mask
mask = set()

Then, samples can be parsed from FASTA files

sample1 = fn5.Sample("<path to sample1's FASTA>", reference, mask, "sample1")
sample2 = fn5.Sample("<path to sample2's FASTA>", reference, mask, "sample2")
...

Save samples

For the sake of efficency, a sample can be written to (and subsequently loaded from) disk.

Note that the fn5.save method writes 5 files to the specified direction. <path>/<sample ID>.A, <path>/<sample ID>.C. ...

fn5.save("<some output directory>", sample1)

Load samples

Load pre-saved samples from disk. This should be significantly faster than re-parsing the FASTA files.

sample1 = fn5.load("<some output directory>/sample1")

Compute distances

Distances can be computed with arbitrary SNP cuttoffs.

Single distance

If a returned distance == your cutoff + 1, the two samples are further away than the SNP cutoff.

sample1.dist(sample2, <some cutoff>)

Distance matrix

By default this method uses 4 threads, and no cutoff. If a pair of samples is missing from the returned distance list, they are further away than the given cutoff.

samples = [fn5.load("<some directory>/"+f) for f in <existing filepaths>]
fn5.compute(samples)

# With more/less threads
fn5.compute(samples, thread_count=12)
fn5.compute(samples, thread_count=1)

# With a SNP cutoff
fn5.compute(samples, cutoff=12)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fn5-2.0.6.tar.gz (3.2 MB view details)

Uploaded Source

File details

Details for the file fn5-2.0.6.tar.gz.

File metadata

  • Download URL: fn5-2.0.6.tar.gz
  • Upload date:
  • Size: 3.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for fn5-2.0.6.tar.gz
Algorithm Hash digest
SHA256 c329d3d52eeef7c23a1daeb78cae4638c39130e16b9c58fc3f7f4d97cb196b85
MD5 5aefd8cf4c0b1fa81f3b926b9db31775
BLAKE2b-256 c03a8752a1c12c13d359b5291842218e913b3347ca50ffbb7cf418047bc8da71

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page