Skip to main content

Extract terminal repeats from retrotransposons (LTRs) or DNA transposons (TIRs). Compose synthetic MITES from complete DNA transposons.

Project description

tSplit the TE-splitter

Extract terminal repeats from retrotransposons (LTRs) or DNA transposons (TIRs). Optionally, compose synthetic MITES from complete DNA transposons.

Table of contents

Algorithm overview

tSplit attempts to identify terminal repeats in transposable elements by first aligning each element to itself using nucmer, and then applying a set of tuneable heuristics to select an alignment pair most likely to represent an LTR or TIR.

  1. Exclude all diagonal/self-matches
  2. If tsplit-LTR: Retain only alignment pairs on the same strand (tandem repeats)
  3. If tsplit-TIR: Retain only alignment pairs on opposite strands (inverse repeats)
  4. Retain pairs for which the 5' match begins within x bases of element start and whose 3' match ends within x bases of element end
  5. Exclude alignment pairs which overlap (potential SSRs)
  6. If multiple candidates remain select alignment pair with largest internal segment (i.e. closest to element ends)

Options and usage

Installing tSplit

Requirements:

Installation options:

# Install from PyPi:
pip install tsplit

# Clone and install from this repository:
git clone https://github.com/Adamtaranto/TE-splitter.git && cd TE-splitter && pip install -e .

Example usage

tSplit contains two programs: tsplit-LTR and tsplit-TIR, for extracting long terminal repeats and terminal inverted repeats, respectively. Options are the same for each.

tsplit-LTR

For each element in retroelements.fasta split into internal and external segments. Split segments will be written to LTR_split_TE-splitter_output.fasta with suffix "_I" for internal or "_LTR" for external segments. LTRs must be at least 10bp in length and share 80% identity and occur within 10bp of each end of the input element.

tsplit-LTR -i retroelements.fasta -p LTR_split

tsplit-TIR

For each element in dna-transposons.fasta split into internal and external (TIR) segments. Split segments will be written to TIR_split_TE-splitter_output.fasta with suffix "_I" for internal or "_TIR" for external segments. TIRs must be at least 10bp in length and share 80% identity and occur within 10bp of each end of the input element. Additionally, synthetic MITEs will be constructed by concatenation of left and right TIRs, with internal segments excised.

tsplit-TIR -i dna-transposons.fasta -p TIR_split --makemites

Standard options

Run tsplit-LTR --help or tsplit-TIR --help to view the programs' most commonly used options:

Usage: tsplit-[LTR or TIR] [-h] -i INFILE [-p PREFIX] [-d OUTDIR]
                        [--splitmode {all,split,internal,external,None}]
                        [--makemites] [--keeptemp] [-v] [-m MAXDIST]
                        [--minid MINID] [--minterm MINTERM] [--minseed MINSEED]
                        [--diagfactor DIAGFACTOR] [--method {blastn,nucmer}]

Help:
  -h, --help         Show this help message and exit.

Input:
  -i, --infile       Multifasta containing complete elements. 
                       (Required)  

Output:
  -p, --prefix       All output files begin with this string.  (Default:[infile basename])  
  -d, --outdir       Write output files to this directory. (Default: cwd)  
  --keeptemp         If set do not remove temp directory on completion.
  -v, --verbose      If set, report progress.

Report settings:
  --splitmode        Options: {all,split,internal,external,None} 
                       all = Report input sequence as well as internal and external segments.  
                       split = Report internal and external segments after splitting.  
                       internal = Report only internal segments.  
                       external = Report only terminal repeat segments.  
                       None = Only report synthetic MITES (when --makemites is also set).  
                       (Default: split)  
  --makemites        Attempt to construct synthetic MITE sequences from TIRs by concatenating 
                       5' and 3' TIRs. Available only in 'tsplit-TIR' mode 

Alignment settings:
  --method          Select alignment tool. Note: blastn may perform better on very short high-identity TRs,
                      while nucmer is more robust to small indels.
                      Options: {blastn,nucmer} 
                      (Default: nucmer)
  --minid           Minimum identity between terminal repeat pairs. As float. 
                      (Default: 80.0)  
  --minterm         Minimum length for a terminal repeat to be considered.  
                      Equivalent to nucmer "--mincluster" 
                      (Default: 10)  
  -m, --maxdist     Terminal repeat candidates must be no more than this many bases from ends of an input element. 
                      Note: Increase this value if you suspect that your element is nested within some flanking sequence. 
                      (Default: 10)
  --minseed         Minimum length of a maximal exact match to be included in final match cluster. 
                      Equivalent to nucmer "--minmatch". 
                      (Default: 5)
  --diagfactor      Maximum diagonal difference factor for clustering of matches within nucmer, 
                      i.e. diagonal difference / match separation 
                      (default 0.20) 
                      Note: Increase value for greater tolerance of indels between terminal repeats.

License

Software provided under MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tsplit-0.1.2.tar.gz (11.8 kB view details)

Uploaded Source

Built Distribution

tsplit-0.1.2-py3-none-any.whl (12.9 kB view details)

Uploaded Python 3

File details

Details for the file tsplit-0.1.2.tar.gz.

File metadata

  • Download URL: tsplit-0.1.2.tar.gz
  • Upload date:
  • Size: 11.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.27.2

File hashes

Hashes for tsplit-0.1.2.tar.gz
Algorithm Hash digest
SHA256 71a68518d140e2d50960bc75c6065125ffca01497a684976d69cd54b515c9597
MD5 6e457a682fe78c15202d4e75e275d73e
BLAKE2b-256 1393b9353d0192d6828e9e38188221fd596df7b49c9a0796317015e07f7a48cd

See more details on using hashes here.

File details

Details for the file tsplit-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: tsplit-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 12.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.27.2

File hashes

Hashes for tsplit-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e77de44aaffd9654980c6f0dfa21106f33d0d8ad1c3104f3660ce69a111568d8
MD5 9205c4d5a0c953f34237545c0a8adaa9
BLAKE2b-256 711f23aeb1f1dbdbc48b8c39b49189b4b5b2d4c7d6bf265f9b3cd878f0c8a522

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page