Discover genotype-phenotype correlations with GA4GH phenopackets
Project description
GPSEA is a Python library for discovery of genotype-phenotype associations.
An example of simple genotype-phenotype association analysis
# Load HPO
import hpotk
store = hpotk.configure_ontology_store()
hpo = store.load_minimal_hpo()
# Load a cohort of phenopackets
from gpsea.data import get_toy_cohort
cohort = get_toy_cohort()
# Analyze genotype-phenotype associations
from gpsea.analysis import configure_cohort_analysis
from gpsea.analysis.predicate import PatientCategories
from gpsea.model import VariantEffect
cohort_analysis = configure_cohort_analysis(cohort, hpo)
frameshift = cohort_analysis.compare_by_variant_effect(VariantEffect.FRAMESHIFT_VARIANT, tx_id='NM_1234.5')
frameshift.summarize(hpo, category=PatientCategories.YES)
provides a pandas data frame with genotype-phenotype correlations:
FRAMESHIFT_VARIANT on NM_1234.5 No Yes
Count Percent Count Percent p value Corrected p value
Arachnodactyly [HP:0001166] 1/10 10% 13/16 81% 0.000781 0.020299
Abnormality of the musculature [HP:0003011] 6/6 100% 11/11 100% 1.000000 1.000000
Abnormal nervous system physiology [HP:0012638] 9/9 100% 15/15 100% 1.000000 1.000000
... ... ... ... ... ... ...
Documentation
Check out the User guide and the API reference for more info:
- Stable documentation (last release on
main
branch) - Latest documentation (bleeding edge, latest commit on
develop
branch)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
gpsea-0.2.0.tar.gz
(140.0 kB
view hashes)
Built Distribution
gpsea-0.2.0-py3-none-any.whl
(167.3 kB
view hashes)