Load numpy arrays from a VCF (variant call file).
Project description
Load numpy arrays from a VCF (variant call file).
Installation
Installation requires numpy and cython:
$ pip install vcfnp
…or:
$ git clone --recursive git://github.com/alimanfoo/vcfnp.git $ cd vcfnp $ python setup.py build_ext --inplace
Usage
import sys import vcfnp import numpy as np import matplotlib.pyplot as plt filename = '/path/to/my.vcf' # load data from fixed fields (except INFO) v = vcfnp.variants(filename).view(np.recarray) # print some simple variant metrics print 'found %s variants (%s SNPs)' % (v.size, np.count_nonzero(v.is_snp)) print 'QUAL mean (std): %s (%s)' % (np.mean(v.QUAL), np.std(v.QUAL)) # load data from INFO field i = vcfnp.info(filename).view(np.recarray) # plot a histogram of variant depth fig = plt.figure(1) ax = fig.add_subplot(111) ax.hist(i.DP) ax.set_title('DP histogram') ax.set_xlabel('DP') plt.show() # load data from sample columns c = vcfnp.calldata(filename).view(np.recarray) c = vcfnp.view2d(c) # print some simple genotype metrics count_phased = np.count_nonzero(c.is_phased) count_variant = np.count_nonzero(np.any(c.genotype > 0, axis=2)) count_missing = np.count_nonzero(~c.is_called) print 'calls (phased, variant, missing): %s (%s, %s, %s)' % (c.flatten().size, count_phased, count_variant, count_missing) # plot a histogram of genotype quality fig = plt.figure(2) ax = fig.add_subplot(111) ax.hist(c.GQ.flatten()) ax.set_title('GQ histogram') ax.set_xlabel('GQ') plt.show()
Acknowledgments
Based on Erik Garrison’s vcflib.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
vcfnp-0.14.tar.gz
(412.2 kB
view details)
File details
Details for the file vcfnp-0.14.tar.gz
.
File metadata
- Download URL: vcfnp-0.14.tar.gz
- Upload date:
- Size: 412.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f23e6630b7df94da1ea44ab57a1d40dcd54847068fd757fd5065afdd71ba9254 |
|
MD5 | b500cda21331b9f257ba1e6a11bc1756 |
|
BLAKE2b-256 | 89eadb6cf24750fcaf83b8b6bf24047cee219a3c331d7f7a2fb6812485a43da4 |