Service for querying the biocommons.uta database
Project description
uta-tools
Service for querying the UTA database
Installation
pip
pip install uta-tools
Development
Clone the repo:
git clone https://github.com/cancervariants/uta-tools
cd uta_tools
Install Pipenv if necessary.
Install backend dependencies and enter Pipenv environment:
pipenv shell
pipenv lock && pipenv sync
UTA Database Installation
uta-tools
uses intalls local UTA database. For other ways to install, visit biocommons.uta.
Local Installation
The following commands will likely need modification appropriate for the installation environment.
-
Install PostgreSQL
-
Create user and database.
$ createuser -U postgres uta_admin $ createuser -U postgres anonymous $ createdb -U postgres -O uta_admin uta
-
To install locally, from the uta_tools/data directory:
export UTA_VERSION=uta_20210129.pgd.gz
curl -O http://dl.biocommons.org/uta/$UTA_VERSION
gzip -cdq ${UTA_VERSION} | grep -v "^REFRESH MATERIALIZED VIEW" | psql -h localhost -U uta_admin --echo-errors --single-transaction -v ON_ERROR_STOP=1 -d uta -p 5433
Connecting to the database
To connect to the UTA database, you can use the default url (postgresql://uta_admin@localhost:5433/uta/uta_20210129
). If you use the default url, you must either set the password using environment variable UTA_PASSWORD
or setting the parameter db_pwd
in the UTA class.
If you do not wish to use the default, you must set the environment variable UTA_DB_URL
which has the format of driver://user:pass@host/database/schema
.
Data Downloads
SeqRepo
uta-tools
relies on seqrepo, which you must download yourself.
From the root directory:
pip install seqrepo
sudo mkdir /usr/local/share/seqrepo
sudo chown $USER /usr/local/share/seqrepo
seqrepo pull -i 2021-01-29
transcript_mappings.tsv
uta-tools
uses Ensembl BioMart to retrieve uta_tools/data/transcript_mappings.tsv
. We currently use Human Genes (GRCh38.p13)
for the dataset and the following attributes we use are: Gene stable ID, Gene stable ID version, Transcript stable ID, Transcript stable ID version, Protein stable ID, Protein stable ID version, RefSeq match transcript (MANE Select), Gene name.
LRG_RefSeqGene
uta-tools
fetches the latest version of LRG_RefSeqGene
. This file is found can be found here.
MANE Summary Data
uta-tools
fetches the latest version of MANE.GRCh38.*.summary.txt.gz
. This file is found can be found here.
Init coding style tests
Code style is managed by flake8 and checked prior to commit.
We use pre-commit to run conformance tests.
This ensures:
- Check code style
- Check for added large files
- Detect AWS Credentials
- Detect Private Key
Before first commit run:
pre-commit install
Testing
From the root directory of the repository:
pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for uta_tools-0.0.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3c9badaf836e3744f8d7f0f3c1d3aed7eaa50cd7d80e9a5755fb3925c3bda17a |
|
MD5 | 4fc1b56d9f7939963829b81b17d52e4a |
|
BLAKE2b-256 | 56dc0bd4c16adb15cebfe6cd14aa812124b74a7cf174d8e80c56ea987ccf332f |