Skip to main content

Generate and apply coherent biomedical lexica

Project description

Biolexica

Tests PyPI PyPI - Python Version PyPI - License Documentation Status Codecov status Cookiecutter template from @cthoyt Code style: black Contributor Covenant

biolexica helps generate and apply coherent biomedical lexica. It takes care of the following:

  1. Getting names and synonyms from a diverse set of inputs (ontologies, databases, custom) using pyobo, bioontologies, biosynonyms, and more.
  2. Merging equivalent terms to best take advantage of different synonyms for the same term from different sources using semra.
  3. Generating lexical index and doing NER using Gilda

Importantly, we pre-define lexica for several entity types that can be readily used with Gilda in the lexica/ folder including:

  1. Cells and cell lines
  2. Diseases, conditions, and other phenotypes
  3. Anatomical terms, tissues, organ systems, etc.

Getting Started

Load a pre-defined grounder like this:

import biolexica

grounder = biolexica.load_grounder("phenotype")

>>> grounder.get_best_match("Alzheimer's disease")
Match(reference=Reference(prefix='doid', identifier='10652'), name="Alzheimer's disease", score=0.7778)

>>> grounder.annotate("Clinical trials for reducing Aβ levels in Alzheimer's disease have been controversial.")
[Annotation(text="Alzheimer's disease", start=42, end=61, match=Match(reference=Reference(prefix='doid', identifier='10652'), name="Alzheimer's disease", score=0.7339))]

Note: Biolexica constructs extended version of gilda.Grounder that has convenience functions and a more simple match data model encoded with Pydantic.

Search PubMed for abstracts and annotate them using a given grounder with:

import biolexica
from biolexica.literature import annotate_abstracts_from_search

grounder = biolexica.load_grounder("phenotype")
pubmed_query = "alzheimer's disease"
annotations = annotate_abstracts_from_search(pubmed_query, grounder=grounder, limit=30)

🚀 Installation

The most recent release can be installed from PyPI with:

pip install biolexica

The most recent code and data can be installed directly from GitHub with:

pip install git+https://github.com/biopragmatics/biolexica.git

👐 Contributing

Contributions, whether filing an issue, making a pull request, or forking, are appreciated. See CONTRIBUTING.md for more information on getting involved.

👋 Attribution

⚖️ License

The code in this package is licensed under the MIT License.

🍪 Cookiecutter

This package was created with @audreyfeldroy's cookiecutter package using @cthoyt's cookiecutter-snekpack template.

🛠️ For Developers

See developer instructions

The final section of the README is for if you want to get involved by making a code contribution.

Development Installation

To install in development mode, use the following:

git clone git+https://github.com/biopragmatics/biolexica.git
cd biolexica
pip install -e .

🥼 Testing

After cloning the repository and installing tox with pip install tox, the unit tests in the tests/ folder can be run reproducibly with:

tox

Additionally, these tests are automatically re-run with each commit in a GitHub Action.

📖 Building the Documentation

The documentation can be built locally using the following:

git clone git+https://github.com/biopragmatics/biolexica.git
cd biolexica
tox -e docs
open docs/build/html/index.html

The documentation automatically installs the package as well as the docs extra specified in the setup.cfg. sphinx plugins like texext can be added there. Additionally, they need to be added to the extensions list in docs/source/conf.py.

The documentation can be deployed to ReadTheDocs using this guide. The .readthedocs.yml YAML file contains all the configuration you'll need. You can also set up continuous integration on GitHub to check not only that Sphinx can build the documentation in an isolated environment (i.e., with tox -e docs-test) but also that ReadTheDocs can build it too.

📦 Making a Release

After installing the package in development mode and installing tox with pip install tox, the commands for making a new release are contained within the finish environment in tox.ini. Run the following from the shell:

tox -e finish

This script does the following:

  1. Uses Bump2Version to switch the version number in the setup.cfg, src/biolexica/version.py, and docs/source/conf.py to not have the -dev suffix
  2. Packages the code in both a tar archive and a wheel using build
  3. Uploads to PyPI using twine. Be sure to have a .pypirc file configured to avoid the need for manual input at this step
  4. Push to GitHub. You'll need to make a release going with the commit where the version was bumped.
  5. Bump the version to the next patch. If you made big changes and want to bump the version by minor, you can use tox -e bumpversion -- minor after.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

biolexica-0.0.5.tar.gz (24.6 kB view details)

Uploaded Source

Built Distribution

biolexica-0.0.5-py3-none-any.whl (18.6 kB view details)

Uploaded Python 3

File details

Details for the file biolexica-0.0.5.tar.gz.

File metadata

  • Download URL: biolexica-0.0.5.tar.gz
  • Upload date:
  • Size: 24.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for biolexica-0.0.5.tar.gz
Algorithm Hash digest
SHA256 1152d1984017eecb137282e365d52bec0cadf563500d291e4b5c27240ec49dfb
MD5 e9139a7780b4b8095ca47474c1b2e084
BLAKE2b-256 f6ec8303f2925e19b56cd0b44c0dcefa445a0b7b18325a0a9ad192445eaf494c

See more details on using hashes here.

Provenance

File details

Details for the file biolexica-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: biolexica-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 18.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for biolexica-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 d7bb2af69aa37528869b56475628f6977a9ebbdef8f3d2d7638e54086640eda4
MD5 0d3e9dae43be5ed9bdeede3ceb158892
BLAKE2b-256 05aac859c68a9b0848e35fbebcbccd2ebb2586cef23ce254dc9593c9c63402c8

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page