Resource for working with microhaploptype data from the ALFRED database (https://alfred.med.yale.edu).
Project description
MicroHapDB
Daniel Standage, 2018
https://github.com/bioforensics/microhapdb
MicroHapDB is a package designed for scientists and researchers interested in microhaplotype analysis. This package is a distribution and convenience mechanism and does not implement any analytics itself. MicroHapDB is designed to work with microhap data from any source, although currently all data was obtained from the Allele Frequency Database (ALFRED)[1] at the Yale University School of Medicine.
Installation
To install:
pip3 install microhapdb
To make sure the package installed correctly:
pip3 install pytest
pytest --pyargs microhapdb --doctest-modules
MicroHapDB requires Python version 3.
Usage
MicroHapDB provides several convenient methods to access microhaplotype data.
- a command-line interface
- a Python API
- a collection of tab-delimited text files
Command-line interface
Invoke microhapdb --help
for a description of the command-line configuration options and several usage examples.
Python API
Programmatic access to microhap data within Python is as simple as invoking import microhapdb
and querying the following tables.
microhapdb.frequencies
microhapdb.loci
microhapdb.populations
microhapdb.variants
Each is a Pandas[2] dataframe object, supporting convenient and efficient listing, subsetting, and query capabilities. There are also two auxiliary tables: one that contains a mapping of all variants to their corresponding microhap loci, and another table cross-referencing external IDs/labels/names with internal MicroHapDB identifiers.
microhapdb.variantmap
microhapdb.idmap
The helper function microhapdb.id_xref
is also useful for retrieving data using any valid identifiers.
The following example demonstrates how data across the different tables can be cross-referenced.
>>> import microhapdb
>>> microhapdb.id_xref('mh02KK-136')
ID Reference Chrom Start End Source
182 MHDBL000183 GRCh38 chr2 227227673 227227743 ALFRED
>>> pops = microhapdb.populations.query('Name.str.contains("Amer")')
>>> pops
ID Name Source
40 MHDBP000041 African Americans ALFRED
67 MHDBP000068 African Americans ALFRED
91 MHDBP000092 European Americans ALFRED
>>> f = microhapdb.frequencies
>>> f[(f.Locus == "MHDBL000183") & (f.Population.isin(pops.ID))]
Locus Population Allele Frequency
75117 MHDBL000183 MHDBP000041 G,T,C 0.172
75118 MHDBL000183 MHDBP000041 G,T,A 0.103
75119 MHDBL000183 MHDBP000041 G,C,C 0.029
75120 MHDBL000183 MHDBP000041 G,C,A 0.000
75121 MHDBL000183 MHDBP000041 T,T,C 0.293
75122 MHDBL000183 MHDBP000041 T,T,A 0.063
75123 MHDBL000183 MHDBP000041 T,C,C 0.132
75124 MHDBL000183 MHDBP000041 T,C,A 0.207
75333 MHDBL000183 MHDBP000068 G,T,C 0.156
75334 MHDBL000183 MHDBP000068 G,T,A 0.148
75335 MHDBL000183 MHDBP000068 G,C,C 0.016
75336 MHDBL000183 MHDBP000068 G,C,A 0.000
75337 MHDBL000183 MHDBP000068 T,T,C 0.336
75338 MHDBL000183 MHDBP000068 T,T,A 0.049
75339 MHDBL000183 MHDBP000068 T,C,C 0.156
75340 MHDBL000183 MHDBP000068 T,C,A 0.139
75525 MHDBL000183 MHDBP000092 G,T,C 0.384
75526 MHDBL000183 MHDBP000092 G,T,A 0.202
75527 MHDBL000183 MHDBP000092 G,C,C 0.000
75528 MHDBL000183 MHDBP000092 G,C,A 0.000
75529 MHDBL000183 MHDBP000092 T,T,C 0.197
75530 MHDBL000183 MHDBP000092 T,T,A 0.000
75531 MHDBL000183 MHDBP000092 T,C,C 0.071
75532 MHDBL000183 MHDBP000092 T,C,A 0.146
See the Pandas documentation for more details on dataframe access and query methods.
Tab-delimited text files
The data behind MicroHapDB is contained in 6 tab-delimited text files.
If you'd prefer not to use MicroHapDB's command-line interface or Python API, it should be trivial load these files directly into R, Julia, or the data science environment of your choice.
Invoke microhapdb files
on the command line to see the location of the installed .tsv
files.
locus.tsv
: microhaplotype locivariant.tsv
: variants associated with each microhap locusallele.tsv
: allele frequencies for 148 loci across 84 populationspopulation.tsv
: summary of the populations studiedvariantmap.tsv
: shows which variants are associated with which lociidmap.tsv
: mapping of all IDs/names/labels to internal MicroHapDB IDs
Citation
If you use this package, please cite our work.
Standage DS (2018) MicroHapDB: programmatic access to published microhaplotype data. GitHub repository, https://github.com/bioforensics/microhapdb.
[1]Rajeevan H, Soundararajan U, Kidd JR, Pakstis AJ, Kidd KK (2012) ALFRED: an allele frequency resource for research and teaching. Nucleic Acids Research, 40(D1): D1010-D1015. doi:10.1093/nar/gkr924.
[2]McKinney W (2010) Data structures for statistical computing in Python. Proceedings of the 9th Python in Science Conference, 51-56.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file microhapdb-0.2.tar.gz
.
File metadata
- Download URL: microhapdb-0.2.tar.gz
- Upload date:
- Size: 917.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5d65dfc9eb53079076da7efadc9d0f5d32ffd3df88674c1bad70b36ca3b8d9e8 |
|
MD5 | 2f9d3c7790cc027c904fea82711ecbb0 |
|
BLAKE2b-256 | 799342989a5c94cff6e2be24185e240319ad7bd1402858d944efa81e52ae3e5a |