Skip to main content

VICC normalization routines for therapeutics

Project description

Thera-Py

DOI: 10.1093/jamiaopen/ooad093

Services and guidelines for normalizing drug (and non-drug therapy) terms.

If you use Thera-Py in scientific works, please cite the following article:

Matthew Cannon, James Stevenson, Kori Kuzma, Susanna Kiwala, Jeremy L Warner, Obi L Griffith, Malachi Griffith, Alex H Wagner, Normalization of drug and therapeutic concepts with Thera-Py, JAMIA Open, Volume 6, Issue 4, December 2023, ooad093, https://doi.org/10.1093/jamiaopen/ooad093

Developer instructions

The following sections include instructions specifically for developers.

Installation

For a development install, we recommend using Pipenv. See the pipenv docs for direction on installing pipenv in your compute environment.

Once installed, from the project root dir, just run:

pipenv sync

Deploying DynamoDB Locally

We use Amazon DynamoDB for data storage. To deploy locally, follow these instructions.

Initialize development environment

Code style is managed by Ruff and checked prior to commit.

We use pre-commit to run conformance tests.

This ensures:

  • Style correctness
  • No large files
  • AWS credentials are present
  • Private key is present

Pre-commit must be installed before your first commit. Use the following command:

pre-commit install

Running tests

Unit tests are provided via pytest.

pipenv run pytest

By default, tests will employ an existing DynamoDB database. For test environments where this is unavailable (e.g. in CI), the THERAPY_TEST environment variable can be set to initialize a local DynamoDB instance with miniature versions of input data files before tests are executed.

export THERAPY_TEST=true

Sometimes, sources will update their data, and our test fixtures and data will become incorrect. The tests/scripts/ subdirectory includes scripts to rebuild data files, although most fixtures will need to be updated manually.

Updating the database

Before you use the CLI to update the database, run the following in a separate terminal to start DynamoDB on port 8000:

java -Djava.library.path=./DynamoDBLocal_lib -jar DynamoDBLocal.jar -sharedDb

To change the port, simply add -port value.

Setting Environment Variables

RxNorm requires a UMLS license, which you can register for one here. You must set the UMLS_API_KEY environment variable to your API key. This can be found in the UTS 'My Profile' area after singing in.

export UMLS_API_KEY=12345-6789-abcdefg-hijklmnop  # make sure to replace with your key!

HemOnc.org data requires a Harvard Dataverse API key. After creating a user account on the Harvard Dataverse website, you can follow these instructions to generate a key. Once you have a key, set the following environment variable:

export DATAVERSE_API_KEY=12345-6789-abcdefgh-hijklmnop  # make sure to replace with your key!

Update source(s)

The Therapy Normalizer currently aggregates therapy data from:

To update source(s), simply set --normalizer to the source(s) you wish to update separated by spaces. For example, the following command updates ChEMBL and Wikidata:

python3 -m therapy.cli --normalizer="chembl wikidata"

You can update all sources at once with the --update_all flag:

python3 -m therapy.cli --update_all

Thera-Py can retrieve all required data itself, using the wags-tails library. By default, data will be housed under ~/.local/share/wags_tails/ in a format like the following:

~/.local/share/wags_tails
├── chembl
│   └── chembl_27.db
├── chemidplus
│   └── chemidplus_20200327.xml
├── drugbank
│   └── drugbank_5.1.8.csv
├── guidetopharmacology
│   ├── guidetopharmacology_ligand_id_mapping_2021.3.tsv
│   └── guidetopharmacology_ligands_2021.3.tsv
├── hemonc
│   ├── hemonc_concepts_20210225.csv
│   ├── hemonc_rels_20210225.csv
│   └── hemonc_synonyms_20210225.csv
├── ncit
│   └── ncit_20.09d.owl
├── rxnorm
│   ├── rxnorm_drug_forms_20210104.yaml
│   └── rxnorm_20210104.RRF
└── wikidata
    └── wikidata_20210425.json

Updates to the HemOnc source depend on the Disease Normalizer service. If the Disease Normalizer database appears to be empty or incomplete, updates to HemOnc will also trigger a refresh of the Disease Normalizer database. See its README for additional data requirements.

Create Merged Concept Groups

The /normalize endpoint relies on merged concept groups. The --update_merged flag generates these groups:

python3 -m therapy.cli --update_merged

Specifying the database URL endpoint

The default URL endpoint is http://localhost:8000. There are two different ways to specify the database URL endpoint.

The first way is to set the --db_url flag to the URL endpoint.

python3 -m therapy.cli --update_all --db_url="http://localhost:8001"

The second way is to set the environment variable THERAPY_NORM_DB_URL to the URL endpoint.

export THERAPY_NORM_DB_URL="http://localhost:8001"
python3 -m therapy.cli --update_all

Starting the therapy normalization service

From the project root, run the following:

uvicorn therapy.main:app --reload

Next, view the OpenAPI docs on your local machine:

http://127.0.0.1:8000/therapy

FAQ

A data import method raised a SourceFormatException instance. How do I proceed?

TheraPy will automatically try to acquire the latest version of data for each source, but sometimes, sources alter the structure of their data (e.g. adding or removing CSV columns). If you encounter a SourceFormatException while importing data, please notify us by creating a new issue if one doesn't already exist, and we will attempt to resolve it.

In the meantime, you can force TheraPy to use an older data release by removing the incompatible version from the source data folder, manually downloading and replacing it with an older version of the data per the structure described above, and calling the CLI with the --use_existing argument.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thera-py-0.5.0.dev1.tar.gz (52.3 kB view details)

Uploaded Source

Built Distribution

thera_py-0.5.0.dev1-py3-none-any.whl (65.3 kB view details)

Uploaded Python 3

File details

Details for the file thera-py-0.5.0.dev1.tar.gz.

File metadata

  • Download URL: thera-py-0.5.0.dev1.tar.gz
  • Upload date:
  • Size: 52.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for thera-py-0.5.0.dev1.tar.gz
Algorithm Hash digest
SHA256 86f5fcc6e6d4d1983ed4110f7f1444eba5cbe4fe5977ca691ea9c671dfdff042
MD5 36964cc5fdd452dc48d6886926dc18d1
BLAKE2b-256 6d4338d50c8f1ac3d0fe97ef23fc414b857a039c7ad68c7bd09f52c3786ef07f

See more details on using hashes here.

Provenance

File details

Details for the file thera_py-0.5.0.dev1-py3-none-any.whl.

File metadata

File hashes

Hashes for thera_py-0.5.0.dev1-py3-none-any.whl
Algorithm Hash digest
SHA256 716a9aea3fa8a3b75dd971f360317d83a2743c9fd358fba1ea324429698c12e9
MD5 3c3862386b7329705fd6f79823b81f8a
BLAKE2b-256 64bdd576a3a2e1853894ca2abc92e9f5e47626f92f30a0ac6df6120d06c88cef

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page