Skip to main content

Indel-aware consensus from aligned BAMs

Project description

Kindel: indel-aware consensus from aligned BAM

JOSS status PyPI version Python support Tests

Kindel reconciles substitutions and CIGAR-described indels to to produce a majority consensus from a SAM/BAM file. Using the --realign option, Kindel can optionally recover consensus across short alignment gaps using soft-clipped sequence information. Where Kindel finds 'clip-dominant' regions of an alignment, in realignment mode it attempts to reassemble the consensus sequence using unaligned sequence context. Primarily intended for use with small alignments of e.g. virus genomes, it has been tested with BAMs created by aligners BWA and Minimap2. If you encounter problems, please open an issue. Please also cite the JOSS article if you find this useful.

Core functionality

clip-dominant region

Reassembly of clip-dominant regions (CDRs) with --realign

clip-dominant region

Features

  • Consensus of aligned substititutions, insertions and deletions
  • Gap closure (--realign) using overlapping soft-clipped alignment context
  • Tested with Illumina alignments from BWA, Minimap2 and Segemehl
  • Support for BAMs with multiple reference contigs, chromosomes
  • Crude frequency-based variant calling with kindel variants (no VCF output)

Limitations

  • Intended for use with small alignments of e.g. virus genomes. Expect slow performance with megabase genomes.
  • SAM/BAM files must contain an SQ header line with reference sequence(s) length.
  • Realignment mode (--realign) is able to close gaps of up to 2x read length given ample depth of coverage.

Installation

# Requires Python 3.8+ and Samtools
pip install kindel

For a complete installation using a conda-compatible package manager:

conda create -y -n kindel python=3.13 samtools
conda activate kindel
pip install kindel

For a local development install:

pip install --editable '.[dev]'  # pip install kindel '.[dev]'

Usage

Also see usage.ipynb

Command line

$ kindel consensus alignment.bam > cns.fa

Generate a consensus sequence from an aligned BAM, saving the consensus sequence to cns.fa

$ kindel consensus --realign alignment.bam > cns.fa

Generate a consensus sequence from an aligned BAM with realignment mode enabled, allowing closure of small gaps in the consensus sequence

$ kindel plot alignment.bam

Generate an interactive plot showing aligned depth alongside insertion, deletion and soft clipping frequency across the genome

$ kindel -h
usage: kindel [-h] {consensus,weights,features,variants,plot} ...

positional arguments:
  {consensus,weights,features,variants,plot,version}
    consensus           Infer consensus sequence(s) from alignment in SAM/BAM
                        format
    weights             Returns table of per-site nucleotide frequencies and
                        coverage
    features            Returns table of per-site nucleotide frequencies and
                        coverage including indels
    variants            Output variants exceeding specified absolute and
                        relative frequency thresholds
    plot                Plot sitewise soft clipping frequency across reference
                        and genome
    version             Show version

optional arguments:
  -h, --help            show this help message and exit

$  kindel consensus -h
usage: kindel consensus [-h] [-r] [--min-depth MIN_DEPTH]
                        [--min-overlap MIN_OVERLAP] [-c CLIP_DECAY_THRESHOLD]
                        [--mask-ends MASK_ENDS] [-t] [-u]
                        bam_path

Infer consensus sequence(s) from alignment in SAM/BAM format

positional arguments:
  bam_path              path to SAM/BAM file

optional arguments:
  -h, --help            show this help message and exit
  -r, --realign         attempt to reconstruct reference around soft-clip
                        boundaries (default: False)
  --min-depth MIN_DEPTH
                        substitute Ns at coverage depths beneath this value
                        (default: 1)
  --min-overlap MIN_OVERLAP
                        match length required to close soft-clipped gaps
                        (default: 7)
  -c CLIP_DECAY_THRESHOLD, --clip-decay-threshold CLIP_DECAY_THRESHOLD
                        read depth fraction at which to cease clip extension
                        (default: 0.1)
  --mask-ends MASK_ENDS
                        ignore clip dominant positions within n positions of
                        termini (default: 50)
  -t, --trim-ends       trim ambiguous nucleotides (Ns) from sequence ends
                        (default: False)
  -u, --uppercase       close gaps using uppercase alphabet (default: False)

Python API

from kindel import kindel

kindel.bam_to_consensus(bam_path, realign=False, min_depth=2, min_overlap=7,
                        clip_decay_threshold=0.1, trim_ends=False, uppercase=False)

Issues

Please let me know if you run into problems by opening a GitHub issue, tweeting @beconstant or mailing me via b at bede dawt im. Ideally send me your BAM, or a subsample of it!

Contributing

If you would like to contribute to this project, please open an issue or contact the author directly using the details above. Please note that this project is released with a Contributor Code of Conduct, and by participating in this project you agree to abide by its terms.

Before opening a pull request, please:

  • Ensure tests pass in a local development build (see installation instructions) by executing pytest inside the package directory.
  • Increment the version number inside __init__.py according to SemVer.
  • Update documentation and/or tests if possible.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kindel-1.0.0.tar.gz (23.9 MB view details)

Uploaded Source

Built Distribution

kindel-1.0.0-py3-none-any.whl (23.9 kB view details)

Uploaded Python 3

File details

Details for the file kindel-1.0.0.tar.gz.

File metadata

  • Download URL: kindel-1.0.0.tar.gz
  • Upload date:
  • Size: 23.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.32.3

File hashes

Hashes for kindel-1.0.0.tar.gz
Algorithm Hash digest
SHA256 54895b935bb78c31ef52d066543b4132f8bb8fc0caed7780dd4e9e731db34435
MD5 8a7245fbff3000e72d227eb25d467fad
BLAKE2b-256 eaba7add4853406198a8bf3ec9618812e2c37cdf91af5392f0827dd8dab4e5f0

See more details on using hashes here.

File details

Details for the file kindel-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: kindel-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 23.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.32.3

File hashes

Hashes for kindel-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1c8e74040a3bcba4ab275f45ab7af602f1ceb8c4b7383ad5606fffb1cfb9b00d
MD5 bde2b72dd6ced3f8bd8e369ec2f83333
BLAKE2b-256 38ba9f78aae7025a1541b1991beb2660f801eb5a1c704d5755863f7b0f2f1caa

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page