Skip to main content

Predict splicing variant effect from VCF

Project description

mmsplice

CircleCI pypi

Predict splicing variant effect from VCF

Paper: Cheng et al. https://doi.org/10.1101/438986

MMSplice

Installation


External dependencies:

pip install cyvcf2 cython

Conda installation is recommended:

conda install cyvcf2 cython -y
pip install mmsplice

Run MMSplice Online

You can run mmsplice with following google colab notebooks online:

Preparation


1. Prepare annotation (gtf) file

Standard human gene annotation file in GTF format can be downloaded from ensembl or gencode. MMSplice can work directly with those files, however, some filtering is higly recommended.

  • Filter for protein coding genes.

2. Prepare variant (VCF) file

A correctly formatted VCF file with work with MMSplice, however the following steps will make it less prone to false positives:

  • Quality filtering. Low quality variants leads to unreliable predictions.
  • Avoid presenting multiple variants in one line by splitting them into multiple lines. Example code to do it:
    bcftools norm -m-both -o out.vcf in.vcf.gz
    
  • Left-normalization. For instance, GGCA-->GG is not left-normalized while GCA-->G is. Details for unified representation of genetic variants see Tan et al.
    bcftools norm -f reference.fasta -o out.vcf in.vcf
    

3. Prepare reference genome (fasta) file

Human reference fasta file can be downloaded from ensembl/gencode. Make sure the chromosome name matches with GTF annotation file you use.

Example code


Check notebooks/example.ipynb

To score variants (including indels), we suggest to use primarily the deltaLogitPSI predictions, which is the default output. The differential splicing efficiency (dse) model was trained from MMSplice modules and exonic variants from MaPSy, thus only the predictions for exonic variants are calibrated.

# Import
from mmsplice.vcf_dataloader import SplicingVCFDataloader
from mmsplice import MMSplice, predict_all_table
from mmsplice.utils import max_varEff

# example files
gtf = 'tests/data/test.gtf'
vcf = 'tests/data/test.vcf.gz'
fasta = 'tests/data/hg19.nochr.chr17.fa'
csv = 'pred.csv'

# dataloader to load variants from vcf
dl = SplicingVCFDataloader(gtf, fasta, vcf)

# Specify model
model = MMSplice()

# predict and save to csv file
predict_save(model, dl, csv, pathogenicity=True, splicing_efficiency=True)

# Or predict and return as df
predictions = predict_all_table(model, dl, pathogenicity=True, splicing_efficiency=True)

# Summerize with maximum effect size
predictionsMax = max_varEff(predictions)

VEP Plugin

The VEP plugin wraps the prediction function from mmsplice python package. Please check documentation of vep plugin under VEP_plugin/README.md.

======= History

1.0.0 (2019-07-23)

  • Dependicies fixed #16
  • Valide gtf, fasta, vcf chrom annotation #15
  • Ship mmsplice with prebuild exon set. #12
  • Faster variant overlapping with pyranges #11
  • Batch prediction with masking update in exon module

0.1.0 (2018-07-17)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mmsplice-1.0.1.tar.gz (26.7 MB view details)

Uploaded Source

Built Distribution

mmsplice-1.0.1-py2.py3-none-any.whl (26.7 MB view details)

Uploaded Python 2 Python 3

File details

Details for the file mmsplice-1.0.1.tar.gz.

File metadata

  • Download URL: mmsplice-1.0.1.tar.gz
  • Upload date:
  • Size: 26.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.1

File hashes

Hashes for mmsplice-1.0.1.tar.gz
Algorithm Hash digest
SHA256 1c1de2c0eb0f6e7bd1969f528835f0097c6ca83ac4b7a87f61cf6b7b76369c01
MD5 d3f123cf1cb7a7a13b1a3fae25bf35c0
BLAKE2b-256 993308451b6448006d1f1f0c5906a354f7ad1d2151f535b6b64bc6f5c3f2077d

See more details on using hashes here.

File details

Details for the file mmsplice-1.0.1-py2.py3-none-any.whl.

File metadata

  • Download URL: mmsplice-1.0.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 26.7 MB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.1

File hashes

Hashes for mmsplice-1.0.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 e82b37fbfaff0db6df5366137a8a85a52171f9b994bc9113ec25d426d129b93c
MD5 f5e9b67d59504ebbf1beab11a9f3aa70
BLAKE2b-256 c9f0e012053d49a32d73b7e71ef0b7541e1f4fbe09e8b81c6a78bd694b395377

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page