Skip to main content

Predict splicing variant effect from VCF

Project description

# mmsplice

[![pypi](https://img.shields.io/pypi/v/mmsplice.svg)](https://pypi-hypernode.com/pypi/mmsplice)
[![travis](https://img.shields.io/travis/s6juncheng/mmsplice.svg)](https://travis-ci.org/s6juncheng/mmsplice)

Predict splicing variant effect from VCF

* Free software: MIT license


## Usage example
------

### Preparation
------

#### 1. Prepare annotation (gtf) file
Standard human gene annotation file in GTF format can be downloaded from ensembl or gencode.
`MMSplice` can work directly with those files, however, some filtering is higly recommended.

- Filter for protein coding genes.
- Filter out duplicated exons. The same exon can be annotated multiple times if it appears in multiple transcripts.
This will cause duplicated predictions.

We provide a filtered version [here](https://raw.githubusercontent.com/gagneurlab/MMSplice_paper/master/data/shared/Homo_sapiens.GRCh37.75.chr.uniq_exon.gtf.gz).
Note this version has chromosome names in the format `chr*`. You may need to remove them to match the chromosome names in your fasta file.

#### 2. Prepare variant (VCF) file
A correctly formatted VCF file with work with `MMSplice`, however the following steps will make it less prone to false positives:

- Quality filtering. Low quality variants leads to unreliable predictions.
- Avoid presenting multiple variants in one line by splitting them into multiple lines. Example code to do it:
```bash
bcftools norm -m-both -o out.vcf in.vcf.gz
```
- Left-normalization. For instance, GGCA-->GG is not left-normalized while GCA-->G is. Details for unified representation of genetic variants see [Tan et al.](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4481842/)
```bash
bcftools norm -f reference.fasta -o out.vcf in.vcf
```

#### 3. Prepare reference genome (fasta) file
Human reference fasta file can be downloaded from ensembl/gencode. Make sure the chromosome name matches with GTF annotation file you use.


### Example code
------

Check [notebooks/example.ipynb](https://github.com/gagneurlab/MMSplice/blob/master/notebooks/example.ipynb)

```python
# Import
from mmsplice.vcf_dataloader import SplicingVCFDataloader
from mmsplice import MMSplice, predict_all_table
from mmsplice.utils import max_varEff

# example files
gtf = 'tests/data/test.gtf'
vcf = 'tests/data/test.vcf.gz'
fasta = 'tests/data/hg19.nochr.chr17.fa'
gtfIntervalTree = '../tests/data/test.pkl' # pickle exon interval Tree

# dataloader to load variants from vcf
dl = SplicingVCFDataloader(gtf,
fasta,
vcf,
out_file=gtfIntervalTree, # same pikled gtf IntervalTree
split_seq=False)

# Specify model
model = MMSplice(
exon_cut_l=0,
exon_cut_r=0,
acceptor_intron_cut=6,
donor_intron_cut=6,
acceptor_intron_len=50,
acceptor_exon_len=3,
donor_exon_len=5,
donor_intron_len=13)

# Do prediction
predictions = predict_all_table(model, dl, batch_size=1024, split_seq=False, assembly=False)

# Summerize with maximum effect size
predictionsMax = max_varEff(predictions)
```

=======
History
=======

0.1.0 (2018-07-17)
------------------

* First release on PyPI.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mmsplice-0.2.4.tar.gz (451.0 kB view details)

Uploaded Source

Built Distribution

mmsplice-0.2.4-py2.py3-none-any.whl (448.3 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file mmsplice-0.2.4.tar.gz.

File metadata

  • Download URL: mmsplice-0.2.4.tar.gz
  • Upload date:
  • Size: 451.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.6.0

File hashes

Hashes for mmsplice-0.2.4.tar.gz
Algorithm Hash digest
SHA256 d6947d204330efec653463211cd28f78b577d202aa3403f23c70d82b4056b5df
MD5 059f3dba4b23df647c55434dfd60e59f
BLAKE2b-256 7bc1634b6ad37835e35a94c487b24166c5c4a95c7379e0b0a7364d1896103a93

See more details on using hashes here.

File details

Details for the file mmsplice-0.2.4-py2.py3-none-any.whl.

File metadata

  • Download URL: mmsplice-0.2.4-py2.py3-none-any.whl
  • Upload date:
  • Size: 448.3 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.6.0

File hashes

Hashes for mmsplice-0.2.4-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 dba9e4acda797f70747a58bfde533d3a7b32232932b130ff4579c7807a8f8920
MD5 00fd9867b9f57747a49d8dacd4632b5f
BLAKE2b-256 d7668f6b318c0ec9fae2edcb188a6b6595d6177b20fab4df90e4eddd68219407

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page