Service for querying the biocommons.uta database
Project description
uta-tools
Service for querying the UTA database
Installation
pip
pip install uta-tools
Development
Clone the repo:
git clone https://github.com/cancervariants/uta-tools
cd uta_tools
Install Pipenv if necessary.
Install backend dependencies and enter Pipenv environment:
pipenv shell
pipenv lock && pipenv sync
UTA Database Installation
uta-tools
uses intalls local UTA database. For other ways to install, visit biocommons.uta.
Local Installation
The following commands will likely need modification appropriate for the installation environment.
-
Install PostgreSQL
-
Create user and database.
$ createuser -U postgres uta_admin $ createuser -U postgres anonymous $ createdb -U postgres -O uta_admin uta
-
To install locally, from the uta_tools/data directory:
export UTA_VERSION=uta_20210129.pgd.gz
curl -O http://dl.biocommons.org/uta/$UTA_VERSION
gzip -cdq ${UTA_VERSION} | grep -v "^REFRESH MATERIALIZED VIEW" | psql -h localhost -U uta_admin --echo-errors --single-transaction -v ON_ERROR_STOP=1 -d uta -p 5433
Connecting to the database
To connect to the UTA database, you can use the default url (postgresql://uta_admin@localhost:5433/uta/uta_20210129
). If you use the default url, you must either set the password using environment variable UTA_PASSWORD
or setting the parameter db_pwd
in the UTA class.
If you do not wish to use the default, you must set the environment variable UTA_DB_URL
which has the format of driver://user:pass@host/database/schema
.
Data Downloads
SeqRepo
uta-tools
relies on seqrepo, which you must download yourself.
From the root directory:
pip install seqrepo
sudo mkdir /usr/local/share/seqrepo
sudo chown $USER /usr/local/share/seqrepo
seqrepo pull -i 2021-01-29
transcript_mappings.tsv
uta-tools
uses Ensembl BioMart to retrieve uta_tools/data/transcript_mappings.tsv
. We currently use Human Genes (GRCh38.p13)
for the dataset and the following attributes we use are: Gene stable ID, Gene stable ID version, Transcript stable ID, Transcript stable ID version, Protein stable ID, Protein stable ID version, RefSeq match transcript (MANE Select), Gene name.
LRG_RefSeqGene
uta-tools
fetches the latest version of LRG_RefSeqGene
. This file is found can be found here.
MANE Summary Data
uta-tools
fetches the latest version of MANE.GRCh38.*.summary.txt.gz
. This file is found can be found here.
Starting the UTA Tools Service Locally
To start the service, run the following:
uvicorn uta_tools.main:app --reload
Next, view the FastAPI on your local machine: http://127.0.0.1:8000/uta_tools
Init coding style tests
Code style is managed by flake8 and checked prior to commit.
We use pre-commit to run conformance tests.
This ensures:
- Check code style
- Check for added large files
- Detect AWS Credentials
- Detect Private Key
Before first commit run:
pre-commit install
Testing
From the root directory of the repository:
pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for uta_tools-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bef7f26e64796d8bb8669d8266758c575827e2ad1ee858c08eb9bce91f1f9c5a |
|
MD5 | 838b7d1db1e8e34b60ba4c27bfdf452a |
|
BLAKE2b-256 | 9e00d34ddd962dafe0e7d105b9ee63e4d7cdc741bf6d198de60ba25e1b601e57 |