Skip to main content

Fast semantic search and comparison

Project description

PumpkinPy - Semantic similarity implemented in python

About

PumpkinPy uses IC ordered bitmaps for fast ranking of genes and diseases (phenotypes are sorted by descending frequency and one-hot encoded). This is useful for larger ontologies such as Upheno and large datasets such as ranking all mouse genes given a set of input HPO terms. This approach was first used in OWLTools and OwlSim-v3.

The goal of this project was to build an implementation of the PhenoDigm algorithm in python. There are also implementations for common measures for distance and similarity (euclidean, cosine, Jin-Conrath, Resnik, jaccard)

Disclaimer: This is a side project needs more documentation and testing

Getting Started

Requires python 3.8+ and python3-dev to install pyroaring

Installing from pypi
pip install pumpkin_py
Building locally

To build locally first install poetry -

https://python-poetry.org/docs/#installation

Then run make:

make

Usage

Get a list of implemented similarity measures

from pumpkin_py import get_methods
get_methods()
['jaccard', 'cosine', 'phenodigm', 'symmetric_phenodigm', 'resnik', 'symmetric_resnik', 'ic_cosine', 'sim_gic']

Load closures and annotations

import gzip
from pathlib import Path

from pumpkin_py import build_ic_graph_from_closures, flat_to_annotations, search

closures = Path('.') / 'data' / 'hpo' / 'hp-closures.tsv.gz'
annotations = Path('.') / 'data' / 'hpo' / 'phenotype-annotations.tsv.gz'

root = "HP:0000118"

with gzip.open(annotations, 'rt') as annot_file:
    annot_map = flat_to_annotations(annot_file)

with gzip.open(closures, 'rt') as closure_file:
    graph = build_ic_graph_from_closures(closure_file, root, annot_map)

Search for the best matching disease given a phenotype profile

import pprint
from pumpkin_py import search

profile_a = (
    "HP:0000403,HP:0000518,HP:0000565,HP:0000767,"
    "HP:0000872,HP:0001257,HP:0001263,HP:0001290,"
    "HP:0001629,HP:0002019,HP:0002072".split(',')
)

search_results = search(profile_a, annot_map, graph, 'phenodigm')

pprint.pprint(search_results.results[0:5])
[SimMatch(id='ORPHA:94125', rank=1, score=72.67599348696685),
 SimMatch(id='ORPHA:79137', rank=2, score=71.57368233248252),
 SimMatch(id='OMIM:619352', rank=3, score=70.98305459477629),
 SimMatch(id='OMIM:618624', rank=4, score=70.94596234638497),
 SimMatch(id='OMIM:617106', rank=5, score=70.83097366257857)]
Example scripts for fetching Monarch annotations and closures

Uses robot and sparql to generate closures and class labels

Annotation data is fetched from the latest Monarch release

  • Requires >Java 8

cd data/monarch/ && make

PhenoDigm Reference: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3649640/
Exomiser: https://github.com/exomiser/Exomiser
OWLTools: https://github.com/owlcollab/owltools
OWLSim-v3: https://github.com/monarch-initiative/owlsim-v3

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pumpkin_py-0.0.2.tar.gz (17.9 kB view details)

Uploaded Source

Built Distribution

pumpkin_py-0.0.2-py3-none-any.whl (23.7 kB view details)

Uploaded Python 3

File details

Details for the file pumpkin_py-0.0.2.tar.gz.

File metadata

  • Download URL: pumpkin_py-0.0.2.tar.gz
  • Upload date:
  • Size: 17.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.10 CPython/3.8.2 Linux/5.4.0-91-generic

File hashes

Hashes for pumpkin_py-0.0.2.tar.gz
Algorithm Hash digest
SHA256 90c5249cb0b2cdbf63b1ef5427c3d3e2c1688c3bb39c75267b422b181901fc43
MD5 a433b2dc1b33f928324f026598a4fb2f
BLAKE2b-256 4f30c901e6e306ae054b49a4ab8dab1ba73a333cf81b58e229380434fbb9e967

See more details on using hashes here.

File details

Details for the file pumpkin_py-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: pumpkin_py-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 23.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.10 CPython/3.8.2 Linux/5.4.0-91-generic

File hashes

Hashes for pumpkin_py-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 89911821a0c9373a7261ce02fc9a924dc40d422fcd87136ae2d24b179e87f633
MD5 4eed5230bf91bfdc2133ea71efc816bf
BLAKE2b-256 ea7e500168dbf1bee45d11444c34ecf5b833314483dda7a6747c14dfa905e67a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page