Find alignment signatures characteristic of transposon insertion sites.
Project description
Tinscan: TE-Insertion-Scanner
Scan whole genome alignments for transposon insertion signatures.
Table of contents
Algorithm overview
- Perform gapped and chained whole genome alignment of query genome B onto target genome A.
- Where two aligned segments are contiguous in B (or separated by no more than --qGap), and
- Separated by an insertion in the range --minInsert:--maxInsert in A, and
- At least one flanking alignment in A satisfies the threshold --minIdent, and differs from its mate by no more than --maxIdentDiff %
- Log flanks and candidate insertion.
- Attempt to infer TSDs from the internal overlap of flanking alignments in B genome.
Options and usage
Installing Tinscan
Requirements:
- LASTZ genome alignment tool from the Miller Lab, Penn State.
- Biopython
You can set up a conda environment with the required dependencies using the YAML files in this repo:
For ARM64 (Apple Silicon Macs) create a virtual intel env.
# For ARM64 Macs only
conda env create -f env_osx64.yml
conda activate tinscan-osx64
For all other operating systems use environment.yml
conda env create -f environment.yml
conda activate tinscan
With the conda env active you can now install tinscan.
- Install from PyPi.
pip install tinscan
- For the latest development version, clone and install from this repository.
git clone https://github.com/Adamtaranto/TE-insertion-scanner.git && cd TE-insertion-scanner && pip install -e ".[tests]"
Example usage
Find insertion events in genome A (target) relative to genome B (query).
Prepare Input Genomes
Split A and B genomes into two directories containing one scaffold per file. Check that sequence names are unique within genomes.
tinscan-prep --adir data/A_target_split --bdir data/B_query_split\
-A data/A_target_genome.fasta -B data/B_query_genome.fasta
Output: data/A_target_split/.fa data/B_query_split/.fa
Align Genomes
Align each scaffold from genome B onto each genome A scaffold. Report alignments with >= 60% identity and length >= 100bp.
tinscan-align --adir data/A_target_split --bdir data/B_query_split \
--outdir A_Inserts --outfile A_Inserts_vs_B.tab \
--minIdt 60 --minLen 100 --hspthresh 3000
Output: A_Inserts/A_Inserts_vs_B.tab
Note: Alignment tasks can be limited to a specified set of pairwise comparisons
where appropriate (i.e. when homologous chromosome pairs are known between
assemblies) using the option --pairs
.
Comparisons are specified with a tab-delimited text file, where column 1 contains sequence names from genome A, and column 2 contains sequences from genome B.
In the example Chromosome_pairs.txt, Chr A2 has been assembled as two scaffolds (B2, B3) in genome B.
#Chromosome_pairs.txt
A1 B1
A2 B2
A2 B3
A3 B4
Find Insertions
Scan alignments for insertion events and report as GFF annotation of Genome A
tinscan-find --infile A_Inserts/A_Inserts_vs_B.tab \
--outdir A_Inserts --gffOut A_Inserts_vs_B_l100_id80.gff3 \
--maxInsert 50000 --minIdent 80 --maxIdentDiff 20
Output: A_Inserts/A_Inserts_vs_B_l100_id80.gff3
License
Software provided under MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file tinscan-0.2.1.tar.gz
.
File metadata
- Download URL: tinscan-0.2.1.tar.gz
- Upload date:
- Size: 13.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e3ca443aeff3bc0f94050e5d46c0f79db65f20e7f13ab440b5963a8a4ed06c48 |
|
MD5 | 6ed27f371be44ae89f229ac9bcd73aec |
|
BLAKE2b-256 | 134f2255ff0821b389832c10710b9ed0e02e6dcc55a5ab9332acaed289c4dd5b |
Provenance
The following attestation bundles were made for tinscan-0.2.1.tar.gz
:
Publisher:
publish.yml
on Adamtaranto/TE-insertion-scanner
-
Statement type:
https://in-toto.io/Statement/v1
- Predicate type:
https://docs.pypi.org/attestations/publish/v1
- Subject name:
tinscan-0.2.1.tar.gz
- Subject digest:
e3ca443aeff3bc0f94050e5d46c0f79db65f20e7f13ab440b5963a8a4ed06c48
- Sigstore transparency entry: 146601027
- Sigstore integration time:
- Predicate type:
File details
Details for the file tinscan-0.2.1-py3-none-any.whl
.
File metadata
- Download URL: tinscan-0.2.1-py3-none-any.whl
- Upload date:
- Size: 11.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f61315f63ff9f66ba109e97cfe7a2f94d4085038ee6906821146d982c0b6d6f2 |
|
MD5 | c0560201f9ed5bac043f824a594dd7b8 |
|
BLAKE2b-256 | b964941ae1b6423a6fbb15316a01778c656e78470e45191eee8bedcb55bf282d |
Provenance
The following attestation bundles were made for tinscan-0.2.1-py3-none-any.whl
:
Publisher:
publish.yml
on Adamtaranto/TE-insertion-scanner
-
Statement type:
https://in-toto.io/Statement/v1
- Predicate type:
https://docs.pypi.org/attestations/publish/v1
- Subject name:
tinscan-0.2.1-py3-none-any.whl
- Subject digest:
f61315f63ff9f66ba109e97cfe7a2f94d4085038ee6906821146d982c0b6d6f2
- Sigstore transparency entry: 146601028
- Sigstore integration time:
- Predicate type: