VICC normalization routine for variations
Project description
Variation Normalization
Services and guidelines for normalizing variation terms into VRS (v1.1.1) and VRSATILE (latest) compatible representations.
Public OpenAPI endpoint: https://normalize.cancervariants.org/variation
About
Variation Normalization works by using four main steps: tokenization, classification, validation, and translation. During tokenization, we split strings on whitespace and parse to determine the type of token. During classification, we specify the order of tokens a classification can have. We then do validation checks such as ensuring references for a nucleotide or amino acid matches the expected value and validating a position exists on the given transcript. During translation, we return a VRS Allele object.
Variation Normalization is limited to substitution, deletion, insertion, and deletion-insertion variants located on p., c., and g. coordinates. We also support HGVS representations and text representation (ex: BRAF V600E
). We are working towards adding more types of variants, coordinates, and representations.
Endpoints
/toVRS
The /toVRS
endpoint returns a list of valid Alleles.
/normalize
The /normalize
endpoint returns a Variation Descriptor containing the MANE Transcript, if one is found.
Backend Services
Variation Normalization relies on some local data caches which you will need to set up. It uses pipenv to manage its environment, which you will also need to install.
Installation
Installing with pip
pip install variation-normalizer
Variation Normalization relies on seqrepo, which you must download yourself.
From the root directory:
pipenv shell
pipenv lock
pipenv sync
cd variation
pip install seqrepo
mkdir -p data/seqrepo
seqrepo -r data/seqrepo pull -i 2021-01-29
sudo chmod -R u+w data/seqrepo
cd data/seqrepo
seqrepo_date_dir=$(ls -d */)
sudo mv $seqrepo_date_dir latest
Variation Normalizer also uses uta.
The following commands will likely need modification appropriate for the installation environment.
-
Install PostgreSQL
-
Create user and database.
$ createuser -U postgres uta_admin $ createuser -U postgres anonymous $ createdb -U postgres -O uta_admin uta
-
To install locally, from the variation/data directory:
export UTA_VERSION=uta_20210129.pgd.gz
curl -O http://dl.biocommons.org/uta/$UTA_VERSION
gzip -cdq ${UTA_VERSION} | grep -v "^REFRESH MATERIALIZED VIEW" | psql -h localhost -U uta_admin --echo-errors --single-transaction -v ON_ERROR_STOP=1 -d uta -p 5433
To connect to the UTA database, you can use the default url (postgresql://uta_admin@localhost:5433/uta/uta_20210129
). If you use the default url, you must either set the password using environment variable UTA_PASSWORD
or setting the parameter db_pwd
in the UTA class.
If you do not wish to use the default, you must set the environment variable UTA_DB_URL
which has the format of driver://user:pass@host/database/schema
.
Data
Variation Normalization uses Ensembl BioMart to retrieve variation/data/transcript_mappings.tsv
. We currently use Human Genes (GRCh38.p13)
for the dataset and the following attributes we use are: Gene stable ID, Gene stable ID version, Transcript stable ID, Transcript stable ID version, Protein stable ID, Protein stable ID version, RefSeq match transcript (MANE Select), Gene name.
Setting up Gene Normalizer
Variation Normalization relies on data from [Gene Normalization](https://github.com/cancervariants/gene-normalization. You must have Gene Normalization's DynamoDB running for the application to work.
To setup, follow the instructions from the README.
Init coding style tests
Code style is managed by flake8 and checked prior to commit.
We use pre-commit to run conformance tests.
This ensures:
- Check code style
- Check for added large files
- Detect AWS Credentials
- Detect Private Key
Before first commit run:
pre-commit install
Testing
From the root directory of the repository:
pytest tests/
Starting the Variation Normalization Service Locally
gene-normalizer
s dynamodb must be running and run the following:
uvicorn variation.main:app --reload
Next, view the OpenAPI docs on your local machine: http://127.0.0.1:8000/variation
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file variation-normalizer-0.2.11.tar.gz
.
File metadata
- Download URL: variation-normalizer-0.2.11.tar.gz
- Upload date:
- Size: 86.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3ea1456bf8d33ee2f42541854456ec6aec583640f869f7e7e9e1a3bd4db942e3 |
|
MD5 | a12e2aadbc56915fe29996f7e89cbe62 |
|
BLAKE2b-256 | 98b2bc23c81ffe9b8e998a0580efae51cdd11a5b401d85c5e53ff3e9d46622e4 |
File details
Details for the file variation_normalizer-0.2.11-py3-none-any.whl
.
File metadata
- Download URL: variation_normalizer-0.2.11-py3-none-any.whl
- Upload date:
- Size: 4.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cc2bb80a4bf7e23734b6ec070fae5e82ccda42cffc7613536aac0f75a42d6eb3 |
|
MD5 | c596cd39351b7d1e14a521603a60c373 |
|
BLAKE2b-256 | 3d8a757917950911fd3f5f3d9859ed6d9309b2e9459baa36d01674bd3200ad67 |