Skip to main content

Predict ancestral sequence of fungal repeat elements by correcting for RIP-like mutations in multi-sequence DNA alignments.

Project description



deRIP2

Predict progenitor sequence of fungal repeat families by correcting for RIP-like mutations (CpA --> TpA) and cytosine deamination (C --> T) events.

Mask RIP or deamination events from input alignment as ambiguous bases.

Table of contents

Algorithm overview

For each column in input alignment:

  • Check if number of gapped rows is greater than max gap proportion. If true, then a gap is added to the output sequence.
  • Set invariant column values in output sequence.
  • If at least X proportion of bases are C/T or G/A (i.e. maxSNPnoise = 0.4, then at least 0.6 of positions in column must be C/T or G/A).
  • If reaminate option is set then revert T-->C or A-->G.
  • If reaminate is not set then check for number of positions in RIP dinucleotide context (C/TpA or TpG/A).
  • If proportion of positions in column in RIP-like context => minRIPlike threshold, AND at least one substrate and one product motif (i.e. CpA and TpA) is present, perform RIP correction in output sequence.
  • For all remaining positions in output sequence (not filled by gap, reaminate, or RIP-correction) inherit sequence from input sequence with the fewest observed RIP events (or greatest GC content if not RIP detected or multiple sequences sharing min-RIP count).

Outputs:

  • Corrected sequence as fasta.
  • Optional, alignment with:
    • Corrected sequence appended.
    • With corrected positions masked as ambiguous bases.

Options and Usage

Installation

Requires Python => v3.8

Clone from this repository:

% git clone https://github.com/Adamtaranto/deRIP2.git && cd deRIP2 && pip install -e .

Install from PyPi.

% pip install derip2

Test installation.

# Print version number and exit.
% derip2 --version

derip2 0.0.4


# Get usage information
% derip2 --help

Example usage

For aligned sequences in 'mintest.fa':

  • Any column with >= 70% gap positions will not be corrected and a gap inserted in corrected sequence.
  • Bases in column must be >= 80% C/T or G/A
  • At least 50% bases in a column must be in RIP dinucleotide context (C/T as CpA / TpA) for correction.
  • Default: Inherit all remaining uncorrected positions from the least RIP'd sequence.
  • Mask all substrate and product motifs from corrected columns as ambiguous bases (i.e. CpA to TpA --> YpA)
derip2 -i tests/data/mintest.fa --format fasta \
--maxGaps 0.7 \
--maxSNPnoise 0.2 \
--minRIPlike 0.5 \
--label derip_name \
--mask \
-d results \
--outAln masked_aligment_with_deRIP.fa --outAlnFormat fasta --outFasta derip_prediction.fa

Output:

  • results/derip_prediction.fa
  • results/masked_aligment_with_deRIP.fa

Issues

Submit feedback to the Issue Tracker

License

Software provided under MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

derip2-0.0.4.tar.gz (12.1 kB view details)

Uploaded Source

Built Distribution

derip2-0.0.4-py3-none-any.whl (13.5 kB view details)

Uploaded Python 3

File details

Details for the file derip2-0.0.4.tar.gz.

File metadata

  • Download URL: derip2-0.0.4.tar.gz
  • Upload date:
  • Size: 12.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.27.2

File hashes

Hashes for derip2-0.0.4.tar.gz
Algorithm Hash digest
SHA256 ee8093a371b77c9cf05ef06abe5466c455dec49fdbabc23e829d1b2d3fc5e647
MD5 212cb20ffb53e13d66c887db1dfd2ada
BLAKE2b-256 723b4a9a59336c89f37e081d62be96c82ae281535ae37901e7e548e9c88aa3aa

See more details on using hashes here.

File details

Details for the file derip2-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: derip2-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 13.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.27.2

File hashes

Hashes for derip2-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 c0a056d88d99532a3680afb754e4f610d28fee725c7ed349f2103ae9287336e4
MD5 e4f7fa5e81a3a7b0ce2e55e579a6f2a7
BLAKE2b-256 7192c66a7e064c57741ac7fded193fd57eb0d6f0ae0f93f2802c1deb9f8e0b42

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page