Indel-aware consensus from aligned BAMs
Project description
Kindel: indel-aware consensus from aligned BAM
Kindel reconciles substitutions and CIGAR-described indels to to produce a majority consensus from a SAM/BAM file. Using the --realign
option, Kindel can optionally recover consensus across short alignment gaps using soft-clipped sequence information. Where Kindel finds 'clip-dominant' regions of an alignment, in realignment mode it attempts to reassemble the consensus sequence using unaligned sequence context. Primarily intended for use with small alignments of e.g. virus genomes, it has been tested with BAMs created by aligners BWA and Minimap2. If you encounter problems, please open an issue. Please also cite the JOSS article if you find this useful.
Core functionality
Reassembly of clip-dominant regions (CDRs) with --realign
Features
- Consensus of aligned substititutions, insertions and deletions
- Gap closure (
--realign
) using overlapping soft-clipped alignment context - Tested with Illumina alignments from BWA, Minimap2 and Segemehl
- Support for BAMs with multiple reference contigs, chromosomes
- Crude frequency-based variant calling with
kindel variants
(no VCF output)
Limitations
- Intended for use with small alignments of e.g. virus genomes. Expect slow performance with megabase genomes.
- SAM/BAM files must contain an SQ header line with reference sequence(s) length.
- Realignment mode (
--realign
) is able to close gaps of up to 2x read length given ample depth of coverage.
Installation
# Requires Python 3.8+ and Samtools
pip install kindel
For a complete installation using a conda-compatible package manager:
conda create -y -n kindel python=3.13 samtools
conda activate kindel
pip install kindel
For a local development install:
pip install --editable '.[dev]' # pip install kindel '.[dev]'
Usage
Also see usage.ipynb
Command line
$ kindel consensus alignment.bam > cns.fa
Generate a consensus sequence from an aligned BAM, saving the consensus sequence to cns.fa
$ kindel consensus --realign alignment.bam > cns.fa
Generate a consensus sequence from an aligned BAM with realignment mode enabled, allowing closure of small gaps in the consensus sequence
$ kindel plot alignment.bam
Generate an interactive plot showing aligned depth alongside insertion, deletion and soft clipping frequency across the genome
$ kindel -h
usage: kindel [-h] {consensus,weights,features,variants,plot} ...
positional arguments:
{consensus,weights,features,variants,plot,version}
consensus Infer consensus sequence(s) from alignment in SAM/BAM
format
weights Returns table of per-site nucleotide frequencies and
coverage
features Returns table of per-site nucleotide frequencies and
coverage including indels
variants Output variants exceeding specified absolute and
relative frequency thresholds
plot Plot sitewise soft clipping frequency across reference
and genome
version Show version
optional arguments:
-h, --help show this help message and exit
$ kindel consensus -h
usage: kindel consensus [-h] [-r] [--min-depth MIN_DEPTH]
[--min-overlap MIN_OVERLAP] [-c CLIP_DECAY_THRESHOLD]
[--mask-ends MASK_ENDS] [-t] [-u]
bam_path
Infer consensus sequence(s) from alignment in SAM/BAM format
positional arguments:
bam_path path to SAM/BAM file
optional arguments:
-h, --help show this help message and exit
-r, --realign attempt to reconstruct reference around soft-clip
boundaries (default: False)
--min-depth MIN_DEPTH
substitute Ns at coverage depths beneath this value
(default: 1)
--min-overlap MIN_OVERLAP
match length required to close soft-clipped gaps
(default: 7)
-c CLIP_DECAY_THRESHOLD, --clip-decay-threshold CLIP_DECAY_THRESHOLD
read depth fraction at which to cease clip extension
(default: 0.1)
--mask-ends MASK_ENDS
ignore clip dominant positions within n positions of
termini (default: 50)
-t, --trim-ends trim ambiguous nucleotides (Ns) from sequence ends
(default: False)
-u, --uppercase close gaps using uppercase alphabet (default: False)
Python API
from kindel import kindel
kindel.bam_to_consensus(bam_path, realign=False, min_depth=2, min_overlap=7,
clip_decay_threshold=0.1, trim_ends=False, uppercase=False)
Issues
Please let me know if you run into problems by opening a GitHub issue, tweeting @beconstant or mailing me via b at bede dawt im
. Ideally send me your BAM, or a subsample of it!
Contributing
If you would like to contribute to this project, please open an issue or contact the author directly using the details above. Please note that this project is released with a Contributor Code of Conduct, and by participating in this project you agree to abide by its terms.
Before opening a pull request, please:
- Ensure tests pass in a local development build (see installation instructions) by executing
pytest
inside the package directory. - Increment the version number inside
__init__.py
according to SemVer. - Update documentation and/or tests if possible.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file kindel-1.0.0.tar.gz
.
File metadata
- Download URL: kindel-1.0.0.tar.gz
- Upload date:
- Size: 23.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.32.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 54895b935bb78c31ef52d066543b4132f8bb8fc0caed7780dd4e9e731db34435 |
|
MD5 | 8a7245fbff3000e72d227eb25d467fad |
|
BLAKE2b-256 | eaba7add4853406198a8bf3ec9618812e2c37cdf91af5392f0827dd8dab4e5f0 |
File details
Details for the file kindel-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: kindel-1.0.0-py3-none-any.whl
- Upload date:
- Size: 23.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.32.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1c8e74040a3bcba4ab275f45ab7af602f1ceb8c4b7383ad5606fffb1cfb9b00d |
|
MD5 | bde2b72dd6ced3f8bd8e369ec2f83333 |
|
BLAKE2b-256 | 38ba9f78aae7025a1541b1991beb2660f801eb5a1c704d5755863f7b0f2f1caa |