Skip to main content

Map MaveDB scoresets to VRS objects

Project description

dcd-map: Map MaveDB data to computable and interoperable variant objects

image image image Actions status

This library implements a novel method for mapping MaveDB scoreset data to GA4GH Variation Representation Specification (VRS) objects, enhancing interoperability for genomic medicine applications. See Arbesfeld et. al. (2023) for a preprint edition of the mapping manuscript, or download the resulting mappings directly.

Installation

Install from PyPI:

python3 -m pip install dcd-mapping

Also ensure the following data dependencies are available:

  • Universal Transcript Archive (UTA): see README for setup instructions. Users with access to Docker on their local devices can use the available Docker image; otherwise, start a relatively recent (version 14+) PostgreSQL instance and add data from the available database dump.
  • SeqRepo: see README for setup instructions. The SeqRepo data directory must be writeable; see specific instructions here for more.
  • Gene Normalizer: see documentation for data setup instructions.
  • blat: Must be available on the local PATH and executable by the user. Otherwise, its location can be set manually with the BLAT_BIN_PATH env var. See the UCSC Genome Browser FAQ for download instructions. For our experiments, we placed the binary in the same directory as these notebooks.

Usage

Use the dcd-map command with a scoreset URN, eg

$ dcd-map urn:mavedb:00000083-c-1

Output is saved in the format <URN>_mapping_results_<ISO datetime>.json in the directory specified by the environment variable MAVEDB_STORAGE_DIR, or ~/.local/share/dcd-mapping by default.

Notebooks

Notebooks for manuscript data analysis and figure generation are provided within notebooks/analysis. See notebooks/analysis/README.md for more information.

Development

Clone the repo

git clone https://github.com/ave-dcd/dcd_mapping
cd dcd_mapping

Create and activate a virtual environment

python3 -m virtualenv venv
source venv/bin/activate

Install as editable and with developer dependencies

python3 -m pip install -e '.[dev,tests]'

Add pre-commit hooks

pre-commit install

Run tests with pytest

pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dcd_mapping-0.1.0.tar.gz (3.2 MB view details)

Uploaded Source

Built Distribution

dcd_mapping-0.1.0-py3-none-any.whl (36.6 kB view details)

Uploaded Python 3

File details

Details for the file dcd_mapping-0.1.0.tar.gz.

File metadata

  • Download URL: dcd_mapping-0.1.0.tar.gz
  • Upload date:
  • Size: 3.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for dcd_mapping-0.1.0.tar.gz
Algorithm Hash digest
SHA256 838757b5b818b1be54500dd22ef84a44000b7b745ac67e2609181a9afcc87cab
MD5 ca3cda2f06f82915d9d850a7ddb0022c
BLAKE2b-256 fa7faae0d4b5fd7e722c6fd7ff4f74f696208aaf99b54fe236a6b13553b1d81d

See more details on using hashes here.

File details

Details for the file dcd_mapping-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: dcd_mapping-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 36.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for dcd_mapping-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 41a39d67d7563af59bc1117b32ccc98c55e13ddc2d7d717e8c08909b58977984
MD5 138e3b5fbfbb734629bddde721c9da32
BLAKE2b-256 498187cf4f3e5421cf53946c8e4a7ca2a2233c6c66ee814ef6651ba8d6585327

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page