Use orthogonal data to determine what ontologies should be used for mapping strings
Project description
When mapping input strings from column/field X in some datasource to terms from OBO foundry ontologies, use the values in column/field Y to determine which ontology to map to.
Note that GitHub uses a hyphen and PyPI uses an underscore
Currently tested on a 32GB MacBook Pro running Catalina. Requires the riot library from Apache Jena. make all uses homebrew for installing Jena, but does not install homebrew. This will probably run on other ‘nix systems but will require a system dependent installation of Jena.
Installation
python3.9 -m venv sm_venv source sm_venv/bin/activate pip install -r requirements.txt pip install -i https://test.pypi.org/simple/ scoped-mapping
Sample code
See Jupyter Notebooks
Scoping mappings based on subsets of NCBItaxon
First download semantic-sql and some of its dependencies. Build an SQLite database with the NCBItaxon content. Building requires lots of disk space, RAM and patience. Well worth it when it comes to query time:
make all
If a dataset has taxon values, one can use them to subset or scope how other values in the dataset should be mapped. For example, the NCBI Biosample metadata collection has MIxS triads (broad, narrow and medium) that could me mapped to ENVO terms in many cases. But ENVO might not be appropriate for cultured samples or samples that were taken from a multicellular organism. One way to check for those cases is looking for transitive subclasses in NCBItaxon. There are numerous ways to do that, but they are all generally computationally expensive.
Here, we use rdftab and relation-graph (via semantic-sql) to infer those transitive subClassOf relationships and load them into an SQLite database. Building this database requires lots of RAM and roughly 10 GB of disk space, but after that the querying is fast and convenient.
Building
Once:
pip install build twine
Every time:
git add ... git commit -m ... git push git tag ... pip install --use-feature=in-tree-build .
Ready to deploy?:
python -m build --sdist --wheel . ls -l dist/
remove all artifacts from all builds in dist/ except for the latest
twine upload --repository pypitest dist/*
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file scoped_mapping-0.9.1.tar.gz
.
File metadata
- Download URL: scoped_mapping-0.9.1.tar.gz
- Upload date:
- Size: 295.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.6.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2df85696d39ace758f725851aeca40449d53d9fb1e7e4706ab81798e9b46f43e |
|
MD5 | 6b2ca10329c0b9243e5d4aa209e4f710 |
|
BLAKE2b-256 | 2df3bd926c7052456527c276930f5e55fa281313282a90ebb49e2e2ac4ab3af8 |
Provenance
File details
Details for the file scoped_mapping-0.9.1-py3-none-any.whl
.
File metadata
- Download URL: scoped_mapping-0.9.1-py3-none-any.whl
- Upload date:
- Size: 8.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.6.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0310ebab747bb02fdbc2831dabfc4b78b4f1f4198b7866853b4da8ea0d99fa46 |
|
MD5 | d1d3bdb38522d886788f0bc4fe4ab3de |
|
BLAKE2b-256 | a2543f1c56de7ea3193cf915a7b800cdb7f8873805cc79103e4715f39e8b49b0 |