Extract terminal repeats from retrotransposons (LTRs) or DNA transposons (TIRs). Compose synthetic MITES from complete DNA transposons.
Project description
tSplit the TE-splitter
Extract terminal repeats from retrotransposons (LTRs) or DNA transposons (TIRs). Optionally, compose synthetic MITES from complete DNA transposons.
Table of contents
Algorithm overview
tSplit attempts to identify terminal repeats in transposable elements by first aligning each element to itself using nucmer, and then applying a set of tuneable heuristics to select an alignment pair most likely to represent an LTR or TIR.
- Exclude all diagonal/self-matches
- If tsplit-LTR: Retain only alignment pairs on the same strand (tandem repeats)
- If tsplit-TIR: Retain only alignment pairs on opposite strands (inverse repeats)
- Retain pairs for which the 5' match begins within x bases of element start and whose 3' match ends within x bases of element end
- Exclude alignment pairs which overlap (potential SSRs)
- If multiple candidates remain select alignment pair with largest internal segment (i.e. closest to element ends)
Options and usage
Installing tSplit
Requirements:
Installation options:
# Install from PyPi:
pip install tsplit
# Clone and install from this repository:
git clone https://github.com/Adamtaranto/TE-splitter.git && cd TE-splitter && pip install -e .
Example usage
tSplit contains two programs: tsplit-LTR and tsplit-TIR, for extracting long terminal repeats and terminal inverted repeats, respectively. Options are the same for each.
tsplit-LTR
For each element in retroelements.fasta split into internal and external segments. Split segments will be written to LTR_split_TE-splitter_output.fasta with suffix "_I" for internal or "_LTR" for external segments. LTRs must be at least 10bp in length and share 80% identity and occur within 10bp of each end of the input element.
tsplit-LTR -i retroelements.fasta -p LTR_split
tsplit-TIR
For each element in dna-transposons.fasta split into internal and external (TIR) segments. Split segments will be written to TIR_split_TE-splitter_output.fasta with suffix "_I" for internal or "_TIR" for external segments. TIRs must be at least 10bp in length and share 80% identity and occur within 10bp of each end of the input element. Additionally, synthetic MITEs will be constructed by concatenation of left and right TIRs, with internal segments excised.
tsplit-TIR -i dna-transposons.fasta -p TIR_split --makemites
Standard options
Run tsplit-LTR --help
or tsplit-TIR --help
to view the programs' most commonly used
options:
Usage: tsplit-[LTR or TIR] [-h] -i INFILE [-p PREFIX] [-d OUTDIR]
[--splitmode {all,split,internal,external,None}]
[--makemites] [--keeptemp] [-v] [-m MAXDIST]
[--minid MINID] [--minterm MINTERM] [--minseed MINSEED]
[--diagfactor DIAGFACTOR] [--method {blastn,nucmer}]
Help:
-h, --help Show this help message and exit.
Input:
-i, --infile Multifasta containing complete elements.
(Required)
Output:
-p, --prefix All output files begin with this string. (Default:[infile basename])
-d, --outdir Write output files to this directory. (Default: cwd)
--keeptemp If set do not remove temp directory on completion.
-v, --verbose If set, report progress.
Report settings:
--splitmode Options: {all,split,internal,external,None}
all = Report input sequence as well as internal and external segments.
split = Report internal and external segments after splitting.
internal = Report only internal segments.
external = Report only terminal repeat segments.
None = Only report synthetic MITES (when --makemites is also set).
(Default: split)
--makemites Attempt to construct synthetic MITE sequences from TIRs by concatenating
5' and 3' TIRs. Available only in 'tsplit-TIR' mode
Alignment settings:
--method Select alignment tool. Note: blastn may perform better on very short high-identity TRs,
while nucmer is more robust to small indels.
Options: {blastn,nucmer}
(Default: nucmer)
--minid Minimum identity between terminal repeat pairs. As float.
(Default: 80.0)
--minterm Minimum length for a terminal repeat to be considered.
Equivalent to nucmer "--mincluster"
(Default: 10)
-m, --maxdist Terminal repeat candidates must be no more than this many bases from ends of an input element.
Note: Increase this value if you suspect that your element is nested within some flanking sequence.
(Default: 10)
--minseed Minimum length of a maximal exact match to be included in final match cluster.
Equivalent to nucmer "--minmatch".
(Default: 5)
--diagfactor Maximum diagonal difference factor for clustering of matches within nucmer,
i.e. diagonal difference / match separation
(default 0.20)
Note: Increase value for greater tolerance of indels between terminal repeats.
License
Software provided under MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file tsplit-0.1.2.tar.gz
.
File metadata
- Download URL: tsplit-0.1.2.tar.gz
- Upload date:
- Size: 11.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.27.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 71a68518d140e2d50960bc75c6065125ffca01497a684976d69cd54b515c9597 |
|
MD5 | 6e457a682fe78c15202d4e75e275d73e |
|
BLAKE2b-256 | 1393b9353d0192d6828e9e38188221fd596df7b49c9a0796317015e07f7a48cd |
File details
Details for the file tsplit-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: tsplit-0.1.2-py3-none-any.whl
- Upload date:
- Size: 12.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.27.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e77de44aaffd9654980c6f0dfa21106f33d0d8ad1c3104f3660ce69a111568d8 |
|
MD5 | 9205c4d5a0c953f34237545c0a8adaa9 |
|
BLAKE2b-256 | 711f23aeb1f1dbdbc48b8c39b49189b4b5b2d4c7d6bf265f9b3cd878f0c8a522 |