Skip to main content

Integrated registry of biological databases and nomenclatures

Project description

Bioregistry

Tests PyPI PyPI - Python Version PyPI - License Documentation Status DOI Code style: black

A community-driven integrative meta-registry of biological databases, ontologies, and other resources.
More information here.

The Bioregistry can be accessed, searched, and queried through its associated website at https://bioregistry.io.

📥 Download

The underlying data of the Bioregistry can be downloaded directly from here. Several exports to YAML, TSV, and RDF can be downloaded via https://bioregistry.io/download.

The manually curated portions of these data are available under the CC0 1.0 Universal License.

🙏 Contributing

If you'd like to request a new prefix, please fill out this issue template. It will automatically generate a pull request! Here's a list of all of the open requests for new prefixes.

There are a few other issue templates for certain updates (e.g., update regex, merge two prefixes, etc.) that you can check here. For anything updates that don't have a corresponding template, feel free to leave a freeform issue for us!

If you want to make a direct contribution, feel free to make edits directly to the bioregistry.json file through the GitHub interface.

Things that would be helpful:

  1. For all entries, add a ["wikidata"]["database"] entry. Many ontologies and databases don't have a property in Wikidata because the process of adding a new property is incredibly cautious. However, anyone can add a database as normal Wikidata item with a Q prefix. One example is UniPathway, whose Wikidata database item is Q85719315. If there's no database item on Wikidata, you can even make one! Note: don't mix this up with a paper describing the resource, Q35631060. If you see there's a paper, you can add it under the ["wikidata"]["paper"] key.
  2. Adding ["homepage"] entry for any entry that doesn't have an external reference

A full list of curation to-do's is automatically generated as a web page here. This page also has a more in-depth tutorial on how to contribute.

🧹 Maintenance

🫀 Health Report

Health Report

The Bioregistry runs some automated tests weekly to check that various metadata haven't gone stale. For example, it checks that the homepages are still available and that each provider URL is still able to resolve. The tests fail if even a single metadata is out of place, so don't be frightened that this badge is almost always red.

♻️ Update

The database is automatically updated daily thanks to scheduled workflows in GitHub Actions. The workflow's configuration can be found here and the last run can be seen here. Further, a changelog can be recapitulated from the commits of the GitHub Actions bot.

If you want to manually update the database after installing in development mode, run the following:

$ bioregistry update

🚀 Installation

The Bioregistry can be installed from PyPI with:

$ pip install bioregistry

It can be installed in development mode for local curation with:

$ git clone https://github.com/biopragmatics/bioregistry.git
$ cd bioregistry
$ pip install --editable .

💪 Usage

Normalizing Prefixes

The Bioregistry can be used to normalize prefixes across MIRIAM and all the (very plentiful) variants that pop up in ontologies in OBO Foundry and the OLS with the normalize_prefix() function.

import bioregistry

# This works for synonym prefixes, like:
assert 'ncbitaxon' == bioregistry.normalize_prefix('taxonomy')

# This works for common mistaken prefixes, like:
assert 'pubchem.compound' == bioregistry.normalize_prefix('pubchem')

# This works for prefixes that are often written many ways, like:
assert 'eccode' == bioregistry.normalize_prefix('ec-code')
assert 'eccode' == bioregistry.normalize_prefix('EC_CODE')

# If a prefix is not registered, it gives back `None`
assert bioregistry.normalize_prefix('not a real key') is None

Normalizing CURIEs

The Bioregistry supports converting a CURIE to a canonical CURIE by normalizing the prefix and removing redundant namespaces embedded in LUIs with the normalize_curie() function.

from bioregistry import normalize_curie

# Idempotent to canonical CURIEs
assert 'chebi:1234' == normalize_curie('chebi:1234')

# Normalize common mistaken prefixes
assert 'pubchem.compound:1234' == normalize_curie('pubchem:1234')

# Normalize mixed case prefixes
assert 'fbbt:1234' == normalize_curie('FBbt:1234')

# Remove the redundant prefix and normalize
assert 'go:1234' == normalize_curie('GO:GO:1234')

Parsing IRIs

The Bioregistry can be used to parse CURIEs from IRIs due to its vast registry of provider URL strings and additional programmatic logic implemented with Python. It can parse OBO Library PURLs, IRIs from the OLS and identifiers.org, IRIs from the Bioregistry website, and any other IRIs from well-formed providers registered in the Bioregistry. The parse_iri() function gets a pre-parsed CURIE, while the curie_from_iri() function makes a canonical CURIE from the pre-parsed CURIE.

from bioregistry import curie_from_iri, parse_iri

# First-party IRI
assert ('chebi', '24867') == parse_iri('https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:24867')
assert 'chebi:24867' == curie_from_iri('https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:24867')

# OBO Library PURL
assert ('chebi', '24867') == parse_iri('http://purl.obolibrary.org/obo/CHEBI_24867')
assert 'chebi:24867' == curie_from_iri('http://purl.obolibrary.org/obo/CHEBI_24867')

# OLS IRI
assert ('chebi', '24867') == parse_iri('https://www.ebi.ac.uk/ols/ontologies/chebi/terms?iri=http://purl.obolibrary.org/obo/CHEBI_24867')
assert 'chebi:24867' == curie_from_iri('https://www.ebi.ac.uk/ols/ontologies/chebi/terms?iri=http://purl.obolibrary.org/obo/CHEBI_24867')

# Identifiers.org IRIs (with varying usage of HTTP(s) and colon/slash separator
assert ('chebi', '24867') == parse_iri('https://identifiers.org/CHEBI:24867')
assert ('chebi', '24867') == parse_iri('http://identifiers.org/CHEBI:24867')
assert ('chebi', '24867') == parse_iri('https://identifiers.org/CHEBI/24867')
assert ('chebi', '24867') == parse_iri('http://identifiers.org/CHEBI/24867')

# Bioregistry IRI
assert ('chebi', '24867') == parse_iri('https://bioregistry.io/chebi:24867')

Generating IRIs

import bioregistry

# Bioregistry IRI
assert bioregistry.get_bioregistry_iri('chebi', '24867') == 'https://bioregistry.io/chebi:24867'

# Default Provider
assert bioregistry.get_default_iri('chebi', '24867') == 'https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:24867'

# OBO Library
assert bioregistry.get_obofoundry_iri('chebi', '24867') == 'http://purl.obolibrary.org/obo/CHEBI_24867'

# OLS IRI
assert bioregistry.get_ols_iri('chebi', '24867') == \
        'https://www.ebi.ac.uk/ols/ontologies/chebi/terms?iri=http://purl.obolibrary.org/obo/CHEBI_24867'

# Bioportal IRI
assert bioregistry.get_bioportal_iri('chebi', '24867') == \
        'https://bioportal.bioontology.org/ontologies/CHEBI/?p=classes&conceptid=http://purl.obolibrary.org/obo/CHEBI_24867'

# Identifiers.org IRI
assert bioregistry.get_identifiers_org_iri('chebi', '24867') == 'https://identifiers.org/CHEBI:24867'

# Name-to-Thing IRI
assert bioregistry.get_n2t_iri('chebi', '24867') == 'https://n2t.net/chebi:24867'

If you're not sure which to choose, use bioregistry.get_link and it will pick the best one for you. Each of these functions could also return None if there isn't a provider available or if the prefix can't be mapped to the various resources.

Getting Metadata

The pattern for an entry in the Bioregistry can be looked up quickly with get_pattern() if it exists. It prefers the custom curated, then MIRIAM, then Wikidata pattern.

import bioregistry

assert '^GO:\\d{7}$' == bioregistry.get_pattern('go')

Entries in the Bioregistry can be checked for deprecation with the is_deprecated() function. MIRIAM and OBO Foundry don't often agree - OBO Foundry takes precedence since it seems to be updated more often.

import bioregistry

assert bioregistry.is_deprecated('nmr')
assert not bioregistry.is_deprecated('efo')

Entries in the Bioregistry can be looked up with the get_resource() function.

import bioregistry

entry = bioregistry.get_resource('taxonomy')
# there are lots of mysteries to discover in this dictionary!

The full Bioregistry can be read in a Python project using:

import bioregistry

registry = bioregistry.read_registry()

🕸️ Resolver App

After installing with the [web] extras, run the resolver CLI with

$ bioregistry web

to run a web app that functions like Identifiers.org, but backed by the Bioregistry. A public instance of this app is hosted by the INDRA Lab at https://bioregistry.io.

👋 Attribution

⚖️ License

The code in this repository is licensed under the MIT License.

📖 Citation

Hopefully there will be a paper describing this resource on bioRxiv sometime in 2021! Until then, you can use the Zenodo BibTeX or CSL.

🎁 Support

The Bioregistry was developed by the INDRA Lab, a part of the Laboratory of Systems Pharmacology and the Harvard Program in Therapeutic Science (HiTS) at Harvard Medical School.

💰 Funding

The development of the Bioregistry is funded by the DARPA Young Faculty Award W911NF2010255 (PI: Benjamin M. Gyori).

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bioregistry-0.3.12.tar.gz (2.6 MB view details)

Uploaded Source

Built Distribution

bioregistry-0.3.12-py3-none-any.whl (2.3 MB view details)

Uploaded Python 3

File details

Details for the file bioregistry-0.3.12.tar.gz.

File metadata

  • Download URL: bioregistry-0.3.12.tar.gz
  • Upload date:
  • Size: 2.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.7

File hashes

Hashes for bioregistry-0.3.12.tar.gz
Algorithm Hash digest
SHA256 533ac75492b98b0a357bcb7bc1306e0dc3c238ad578aa347e41c746d5b272ec7
MD5 b2eed9b053964a4f73844a7ab1c2563d
BLAKE2b-256 ce75c3778293ca220dbacb7dd83f6a0655672e85a469f64c8970e84718f9131f

See more details on using hashes here.

Provenance

File details

Details for the file bioregistry-0.3.12-py3-none-any.whl.

File metadata

  • Download URL: bioregistry-0.3.12-py3-none-any.whl
  • Upload date:
  • Size: 2.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.7

File hashes

Hashes for bioregistry-0.3.12-py3-none-any.whl
Algorithm Hash digest
SHA256 81b55f3ff19a5084487506146328f06ff547d80949e825f1509553bee18556a0
MD5 8a1a44b1b975d4d2b5093c0c889658c6
BLAKE2b-256 f78dc7692d8578814f8cfa3c24a75f088e17a5c5bb5fbb1b7427742ea09afaf1

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page