Skip to main content

Use orthogonal data to determine what ontologies should be used for mapping strings

Project description

When mapping input strings from column/field X in some datasource to terms from OBO foundry ontologies, use the values in column/field Y to determine which ontology to map to.

Note that GitHub uses a hyphen and PyPI uses an underscore

Currently tested on a 32GB MacBook Pro running Catalina. Requires the riot library from Apache Jena. make all uses homebrew for installing Jena, but does not install homebrew. This will probably run on other ‘nix systems but will require a system dependent installation of Jena.

Installation

python3.9 -m venv sm_venv
source sm_venv/bin/activate
pip install -r requirements.txt
pip install -i https://test.pypi.org/simple/ scoped-mapping

Sample code

See Jupyter Notebooks

Scoping mappings based on subsets of NCBItaxon

First download semantic-sql and some of its dependencies. Build an SQLite database with the NCBItaxon content. Building requires lots of disk space, RAM and patience. Well worth it when it comes to query time:

make all

If a dataset has taxon values, one can use them to subset or scope how other values in the dataset should be mapped. For example, the NCBI Biosample metadata collection has MIxS triads (broad, narrow and medium) that could me mapped to ENVO terms in many cases. But ENVO might not be appropriate for cultured samples or samples that were taken from a multicellular organism. One way to check for those cases is looking for transitive subclasses in NCBItaxon. There are numerous ways to do that, but they are all generally computationally expensive.

Here, we use rdftab and relation-graph (via semantic-sql) to infer those transitive subClassOf relationships and load them into an SQLite database. Building this database requires lots of RAM and roughly 10 GB of disk space, but after that the querying is fast and convenient.

Building

Once:

pip install build twine

Every time:

git add ...
git commit -m ...
git push
git tag ...
pip install --use-feature=in-tree-build .

Ready to deploy?:

python -m build --sdist --wheel .
ls -l dist/

remove all artifacts from all builds in dist/ except for the latest

twine upload --repository pypitest dist/*

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scoped_mapping-0.9.1.tar.gz (295.3 kB view details)

Uploaded Source

Built Distribution

scoped_mapping-0.9.1-py3-none-any.whl (8.9 kB view details)

Uploaded Python 3

File details

Details for the file scoped_mapping-0.9.1.tar.gz.

File metadata

  • Download URL: scoped_mapping-0.9.1.tar.gz
  • Upload date:
  • Size: 295.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.9.5

File hashes

Hashes for scoped_mapping-0.9.1.tar.gz
Algorithm Hash digest
SHA256 2df85696d39ace758f725851aeca40449d53d9fb1e7e4706ab81798e9b46f43e
MD5 6b2ca10329c0b9243e5d4aa209e4f710
BLAKE2b-256 2df3bd926c7052456527c276930f5e55fa281313282a90ebb49e2e2ac4ab3af8

See more details on using hashes here.

Provenance

File details

Details for the file scoped_mapping-0.9.1-py3-none-any.whl.

File metadata

  • Download URL: scoped_mapping-0.9.1-py3-none-any.whl
  • Upload date:
  • Size: 8.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.9.5

File hashes

Hashes for scoped_mapping-0.9.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0310ebab747bb02fdbc2831dabfc4b78b4f1f4198b7866853b4da8ea0d99fa46
MD5 d1d3bdb38522d886788f0bc4fe4ab3de
BLAKE2b-256 a2543f1c56de7ea3193cf915a7b800cdb7f8873805cc79103e4715f39e8b49b0

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page