Predict ancestral sequence of fungal repeat elements by correcting for RIP-like mutations in multi-sequence DNA alignments.
Project description
deRIP2
Predict progenitor sequence of fungal repeat families by correcting for RIP-like mutations (CpA --> TpA) and cytosine deamination (C --> T) events.
Mask RIP or deamination events from input alignment as ambiguous bases.
Table of contents
Algorithm overview
For each column in input alignment:
- Check if number of gapped rows is greater than max gap proportion. If true, then a gap is added to the output sequence.
- Set invariant column values in output sequence.
- If at least X proportion of bases are C/T or G/A (i.e. maxSNPnoise = 0.4, then at least 0.6 of positions in column must be C/T or G/A).
- If reaminate option is set then revert T-->C or A-->G.
- If reaminate is not set then check for number of positions in RIP dinucleotide context (C/TpA or TpG/A).
- If proportion of positions in column in RIP-like context => minRIPlike threshold, AND at least one substrate and one product motif (i.e. CpA and TpA) is present, perform RIP correction in output sequence.
- For all remaining positions in output sequence (not filled by gap, reaminate, or RIP-correction) inherit sequence from input sequence with the fewest observed RIP events (or greatest GC content if not RIP detected or multiple sequences sharing min-RIP count).
Outputs:
- Corrected sequence as fasta.
- Optional, alignment with:
- Corrected sequence appended.
- With corrected positions masked as ambiguous bases.
Options and Usage
Installation
Requires Python => v3.8
Clone from this repository:
% git clone https://github.com/Adamtaranto/deRIP2.git && cd deRIP2 && pip install -e .
Install from PyPi.
% pip install derip2
Test installation.
# Print version number and exit.
% derip2 --version
derip2 0.0.4
# Get usage information
% derip2 --help
Example usage
For aligned sequences in 'mintest.fa':
- Any column with >= 70% gap positions will not be corrected and a gap inserted in corrected sequence.
- Bases in column must be >= 80% C/T or G/A
- At least 50% bases in a column must be in RIP dinucleotide context (C/T as CpA / TpA) for correction.
- Default: Inherit all remaining uncorrected positions from the least RIP'd sequence.
- Mask all substrate and product motifs from corrected columns as ambiguous bases (i.e. CpA to TpA --> YpA)
derip2 -i tests/data/mintest.fa --format fasta \
--maxGaps 0.7 \
--maxSNPnoise 0.2 \
--minRIPlike 0.5 \
--label derip_name \
--mask \
-d results \
--outAln masked_aligment_with_deRIP.fa --outAlnFormat fasta --outFasta derip_prediction.fa
Output:
- results/derip_prediction.fa
- results/masked_aligment_with_deRIP.fa
Issues
Submit feedback to the Issue Tracker
License
Software provided under MIT license.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file derip2-0.0.4.tar.gz
.
File metadata
- Download URL: derip2-0.0.4.tar.gz
- Upload date:
- Size: 12.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.27.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ee8093a371b77c9cf05ef06abe5466c455dec49fdbabc23e829d1b2d3fc5e647 |
|
MD5 | 212cb20ffb53e13d66c887db1dfd2ada |
|
BLAKE2b-256 | 723b4a9a59336c89f37e081d62be96c82ae281535ae37901e7e548e9c88aa3aa |
File details
Details for the file derip2-0.0.4-py3-none-any.whl
.
File metadata
- Download URL: derip2-0.0.4-py3-none-any.whl
- Upload date:
- Size: 13.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.27.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c0a056d88d99532a3680afb754e4f610d28fee725c7ed349f2103ae9287336e4 |
|
MD5 | e4f7fa5e81a3a7b0ce2e55e579a6f2a7 |
|
BLAKE2b-256 | 7192c66a7e064c57741ac7fded193fd57eb0d6f0ae0f93f2802c1deb9f8e0b42 |