Skip to main content

Find alignment signatures characteristic of transposon insertion sites.

Project description

Tinscan: TE-Insertion-Scanner

Scan whole genome alignments for transposon insertion signatures.

Table of contents

Algorithm overview

  1. Perform gapped and chained whole genome alignment of query genome B onto target genome A.
  2. Where two aligned segments are contiguous in B (or separated by no more than --qGap), and
  3. Separated by an insertion in the range --minInsert:--maxInsert in A, and
  4. At least one flanking alignment in A satisfies the threshold --minIdent, and differs from its mate by no more than --maxIdentDiff %
  5. Log flanks and candidate insertion.
  6. Attempt to infer TSDs from the internal overlap of flanking alignments in B genome.

Options and usage

Installing Tinscan

Requirements:

  • LASTZ genome alignment tool from the Miller Lab, Penn State.
  • Biopython

You can set up a conda environment with the required dependencies using the YAML files in this repo:

For ARM64 (Apple Silicon Macs) create a virtual intel env.

# For ARM64 Macs only
conda env create -f env_osx64.yml
conda activate tinscan-osx64

For all other operating systems use environment.yml

conda env create -f environment.yml
conda activate tinscan

With the conda env active you can now install tinscan.

  1. Install from PyPi.
pip install tinscan
  1. For the latest development version, clone and install from this repository.
git clone https://github.com/Adamtaranto/TE-insertion-scanner.git && cd TE-insertion-scanner && pip install -e ".[tests]"

Example usage

Find insertion events in genome A (target) relative to genome B (query).

Prepare Input Genomes

Split A and B genomes into two directories containing one scaffold per file. Check that sequence names are unique within genomes.

tinscan-prep --adir data/A_target_split --bdir data/B_query_split\
-A data/A_target_genome.fasta -B data/B_query_genome.fasta 

Output: data/A_target_split/.fa data/B_query_split/.fa

Align Genomes

Align each scaffold from genome B onto each genome A scaffold. Report alignments with >= 60% identity and length >= 100bp.

tinscan-align --adir data/A_target_split --bdir data/B_query_split \
--outdir A_Inserts --outfile A_Inserts_vs_B.tab \
--minIdt 60 --minLen 100 --hspthresh 3000

Output: A_Inserts/A_Inserts_vs_B.tab

Note: Alignment tasks can be limited to a specified set of pairwise comparisons where appropriate (i.e. when homologous chromosome pairs are known between assemblies) using the option --pairs.

Comparisons are specified with a tab-delimited text file, where column 1 contains sequence names from genome A, and column 2 contains sequences from genome B.

In the example Chromosome_pairs.txt, Chr A2 has been assembled as two scaffolds (B2, B3) in genome B.

#Chromosome_pairs.txt
A1    B1
A2    B2
A2    B3
A3    B4

Find Insertions

Scan alignments for insertion events and report as GFF annotation of Genome A

tinscan-find --infile A_Inserts/A_Inserts_vs_B.tab \
--outdir A_Inserts --gffOut A_Inserts_vs_B_l100_id80.gff3 \
--maxInsert 50000 --minIdent 80 --maxIdentDiff 20

Output: A_Inserts/A_Inserts_vs_B_l100_id80.gff3

License

Software provided under MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tinscan-0.2.1.tar.gz (13.0 kB view details)

Uploaded Source

Built Distribution

tinscan-0.2.1-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file tinscan-0.2.1.tar.gz.

File metadata

  • Download URL: tinscan-0.2.1.tar.gz
  • Upload date:
  • Size: 13.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for tinscan-0.2.1.tar.gz
Algorithm Hash digest
SHA256 e3ca443aeff3bc0f94050e5d46c0f79db65f20e7f13ab440b5963a8a4ed06c48
MD5 6ed27f371be44ae89f229ac9bcd73aec
BLAKE2b-256 134f2255ff0821b389832c10710b9ed0e02e6dcc55a5ab9332acaed289c4dd5b

See more details on using hashes here.

Provenance

The following attestation bundles were made for tinscan-0.2.1.tar.gz:

Publisher: publish.yml on Adamtaranto/TE-insertion-scanner

Attestations:

File details

Details for the file tinscan-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: tinscan-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 11.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for tinscan-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f61315f63ff9f66ba109e97cfe7a2f94d4085038ee6906821146d982c0b6d6f2
MD5 c0560201f9ed5bac043f824a594dd7b8
BLAKE2b-256 b964941ae1b6423a6fbb15316a01778c656e78470e45191eee8bedcb55bf282d

See more details on using hashes here.

Provenance

The following attestation bundles were made for tinscan-0.2.1-py3-none-any.whl:

Publisher: publish.yml on Adamtaranto/TE-insertion-scanner

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page