Predict splicing variant effect from VCF
Project description
mmsplice
Predict splicing variant effect from VCF
Paper: Cheng et al. https://doi.org/10.1101/438986
Installation
External dependencies:
pip install cyvcf2 cython
Conda installation is recommended:
conda install cyvcf2 cython -y
pip install mmsplice
Run MMSplice Online
You can run mmsplice with following google colab notebooks online:
Preparation
1. Prepare annotation (gtf) file
Standard human gene annotation file in GTF format can be downloaded from ensembl or gencode.
MMSplice
can work directly with those files, however, some filtering is higly recommended.
- Filter for protein coding genes.
2. Prepare variant (VCF) file
A correctly formatted VCF file with work with MMSplice
, however the following steps will make it less prone to false positives:
- Quality filtering. Low quality variants leads to unreliable predictions.
- Avoid presenting multiple variants in one line by splitting them into multiple lines. Example code to do it:
bcftools norm -m-both -o out.vcf in.vcf.gz
- Left-normalization. For instance, GGCA-->GG is not left-normalized while GCA-->G is. Details for unified representation of genetic variants see Tan et al.
bcftools norm -f reference.fasta -o out.vcf in.vcf
3. Prepare reference genome (fasta) file
Human reference fasta file can be downloaded from ensembl/gencode. Make sure the chromosome name matches with GTF annotation file you use.
Example code
Check notebooks/example.ipynb
To score variants (including indels), we suggest to use primarily the deltaLogitPSI
predictions, which is the default output. The differential splicing efficiency (dse) model was trained from MMSplice modules and exonic variants from MaPSy, thus only the predictions for exonic variants are calibrated.
# Import
from mmsplice.vcf_dataloader import SplicingVCFDataloader
from mmsplice import MMSplice, predict_all_table
from mmsplice.utils import max_varEff
# example files
gtf = 'tests/data/test.gtf'
vcf = 'tests/data/test.vcf.gz'
fasta = 'tests/data/hg19.nochr.chr17.fa'
csv = 'pred.csv'
# dataloader to load variants from vcf
dl = SplicingVCFDataloader(gtf, fasta, vcf)
# Specify model
model = MMSplice()
# predict and save to csv file
predict_save(model, dl, csv, pathogenicity=True, splicing_efficiency=True)
# Or predict and return as df
predictions = predict_all_table(model, dl, pathogenicity=True, splicing_efficiency=True)
# Summerize with maximum effect size
predictionsMax = max_varEff(predictions)
VEP Plugin
The VEP plugin wraps the prediction function from mmsplice
python package. Please check documentation of vep plugin under VEP_plugin/README.md.
======= History
1.0.0 (2019-07-23)
- Dependicies fixed #16
- Valide gtf, fasta, vcf chrom annotation #15
- Ship mmsplice with prebuild exon set. #12
- Faster variant overlapping with pyranges #11
- Batch prediction with masking update in exon module
0.1.0 (2018-07-17)
- First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file mmsplice-1.0.2.tar.gz
.
File metadata
- Download URL: mmsplice-1.0.2.tar.gz
- Upload date:
- Size: 26.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 515e01c78af28fd13484b4eec23c40ba7317d3f4422278f39423fe33dfaedea2 |
|
MD5 | ccd147e9fbe4ba5ef0bfd408132ea8d7 |
|
BLAKE2b-256 | 2fe1fed76d669b880caf39e9930f9d9804ecf0d555fa2bcab139b5e31637d529 |
File details
Details for the file mmsplice-1.0.2-py2.py3-none-any.whl
.
File metadata
- Download URL: mmsplice-1.0.2-py2.py3-none-any.whl
- Upload date:
- Size: 26.7 MB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ea0d20d1090be2fbf40b1e6e3c1a6d1eee0ff364ee65a43e8294dfe834ddfc8f |
|
MD5 | dee976370a7b3b12cacb54adf406e300 |
|
BLAKE2b-256 | 6ed03bd8fc91a3d01b4654726359f6bbfd070788fa3879cffb1a8f7a56ca90ae |