Skip to main content

Idiomatic conversion between URIs and compact URIs (CURIEs).

Project description

curies

Tests PyPI PyPI - Python Version PyPI - License Documentation Status Codecov status Cookiecutter template from @cthoyt Code style: black Contributor Covenant DOI

Idiomatic conversion between URIs and compact URIs (CURIEs).

from curies import Converter

converter = Converter.from_prefix_map({
    "CHEBI": "http://purl.obolibrary.org/obo/CHEBI_",
    "MONDO": "http://purl.obolibrary.org/obo/MONDO_",
    "GO": "http://purl.obolibrary.org/obo/GO_",
    # ... and so on
    "OBO": "http://purl.obolibrary.org/obo/",
})

>>> converter.compress("http://purl.obolibrary.org/obo/CHEBI_1")
'CHEBI:1'

>>> converter.expand("CHEBI:1")
'http://purl.obolibrary.org/obo/CHEBI_1'

# Unparsable
>>> assert converter.compress("http://example.com/missing:0000000") is None
>>> assert converter.expand("missing:0000000") is None

When some URI prefixes are partially overlapping (e.g., http://purl.obolibrary.org/obo/GO_ for GO and http://purl.obolibrary.org/obo/ for OBO), the longest URI prefix will always be matched. For example, compressing http://purl.obolibrary.org/obo/GO_0032571 will return GO:0032571 instead of OBO:GO_0032571.

Full documentation is available at curies.readthedocs.io.

Standardization

The curies.Converter data structure supports prefix and URI prefix synonyms. The following exampl demonstrates using these synonyms to support standardizing prefixes, CURIEs, and URIs. Note below, the colloquial prefix gomf, sometimes used to represent the subspace in the Gene Ontology (GO) corresponding to molecular functions, is upgraded to the preferred prefix, GO.

from curies import Converter, Record

converter = Converter([
    Record(
        prefix="GO",
        prefix_synonyms=["gomf", "gocc", "gobp", "go", ...],
        uri_prefix="http://purl.obolibrary.org/obo/GO_",
        uri_prefix_synonyms=[
            "http://amigo.geneontology.org/amigo/term/GO:",
            "https://identifiers.org/GO:",
            ...
        ],
    ),
    # And so on
    ...
])

>>> converter.standardize_prefix("gomf")
'GO'
>>> converter.standardize_curie('gomf:0032571')
'GO:0032571'
>>> converter.standardize_uri('http://amigo.geneontology.org/amigo/term/GO:0032571')
'http://purl.obolibrary.org/obo/GO_0032571'

Note: non-standard URIs can still be parsed with converter.parse_uri() and compressed into CURIEs with converter.compress().

Loading Prefix Maps

All loader function work on local file paths, remote URLs, and pre-loaded data structures. For example, a converter can be instantiated from a web-based resource in JSON-LD format:

from curies import Converter

url = "https://raw.githubusercontent.com/biopragmatics/bioregistry/main/exports/contexts/semweb.context.jsonld"
converter = Converter.from_jsonld(url)

Several converters can be instantiated from pre-defined web-based resources:

import curies

# Uses the Bioregistry, an integrative, comprehensive registry
bioregistry_converter = curies.get_bioregistry_converter()

# Uses the OBO Foundry, a registry of ontologies
obo_converter = curies.get_obo_converter()

# Uses the Monarch Initative's project-specific context
monarch_converter = curies.get_monarch_converter()

Bulk Operations

Apply in bulk to a pandas.DataFrame with Converter.pd_expand and Converter.pd_compress:

import curies
import pandas as pd

df = pd.read_csv(...)
obo_converter = curies.get_obo_converter()
obo_converter.pd_compress(df, column=0)
obo_converter.pd_expand(df, column=0)

Apply in bulk to a CSV file with Converter.file_expand and Converter.file_compress (defaults to using tab separator):

import curies

path = ...
obo_converter = curies.get_obo_converter()
# modifies file in place
obo_converter.file_compress(path, column=0)
# modifies file in place
obo_converter.file_expand(path, column=0)

CLI Usage

This package comes with a built-in CLI for running a resolver web application or a IRI mapper web application:

# Run a resolver
python -m curies resolver --host 0.0.0.0 --port 8764 bioregistry 

# Run a mapper
python -m curies mapper --host 0.0.0.0 --port 8764 bioregistry 

The positional argument can be one of the following:

  1. A pre-defined prefix map to get from the web (bioregistry, go, obo, monarch, prefixcommons)
  2. A local file path or URL to a prefix map, extended prefix map, or one of several formats. Requires specifying a --format.

The framework can be swapped to use Flask (default) or FastAPI with --framework. The server can be swapped to use Werkzeug (default) or Uvicorn with --server. These functionalities are also available programmatically, see the docs for more information.

🧑‍🤝‍🧑 Related

Other packages that convert between CURIEs and URIs:

🚀 Installation

The most recent release can be installed from PyPI with:

$ pip install curies

👐 Contributing

Contributions, whether filing an issue, making a pull request, or forking, are appreciated. See CONTRIBUTING.md for more information on getting involved.

👋 Attribution

🙏 Acknowledgements

This package heavily builds on the trie data structure implemented in pytrie.

⚖️ License

The code in this package is licensed under the MIT License.

🍪 Cookiecutter

This package was created with @audreyfeldroy's cookiecutter package using @cthoyt's cookiecutter-snekpack template.

🛠️ For Developers

See developer instructions

The final section of the README is for if you want to get involved by making a code contribution.

Development Installation

To install in development mode, use the following:

$ git clone git+https://github.com/cthoyt/curies.git
$ cd curies
$ pip install -e .

🥼 Testing

After cloning the repository and installing tox with pip install tox, the unit tests in the tests/ folder can be run reproducibly with:

$ tox

Additionally, these tests are automatically re-run with each commit in a GitHub Action.

📖 Building the Documentation

The documentation can be built locally using the following:

$ git clone git+https://github.com/cthoyt/curies.git
$ cd curies
$ tox -e docs
$ open docs/build/html/index.html

The documentation automatically installs the package as well as the docs extra specified in the setup.cfg. sphinx plugins like texext can be added there. Additionally, they need to be added to the extensions list in docs/source/conf.py.

📦 Making a Release

After installing the package in development mode and installing tox with pip install tox, the commands for making a new release are contained within the finish environment in tox.ini. Run the following from the shell:

$ tox -e finish

This script does the following:

  1. Uses Bump2Version to switch the version number in the setup.cfg, src/curies/version.py, and docs/source/conf.py to not have the -dev suffix
  2. Packages the code in both a tar archive and a wheel using build
  3. Uploads to PyPI using twine. Be sure to have a .pypirc file configured to avoid the need for manual input at this step
  4. Push to GitHub. You'll need to make a release going with the commit where the version was bumped.
  5. Bump the version to the next patch. If you made big changes and want to bump the version by minor, you can use tox -e bumpversion minor after.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

curies-0.5.2.tar.gz (39.1 kB view details)

Uploaded Source

Built Distribution

curies-0.5.2-py3-none-any.whl (25.6 kB view details)

Uploaded Python 3

File details

Details for the file curies-0.5.2.tar.gz.

File metadata

  • Download URL: curies-0.5.2.tar.gz
  • Upload date:
  • Size: 39.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for curies-0.5.2.tar.gz
Algorithm Hash digest
SHA256 6248d3689b122645144e4543fa073d6f8785bf4039e32289cc43bf5d449ca880
MD5 add990872477f9e9b8222bc51e549e05
BLAKE2b-256 e7ad93e54bec7c76257d4882e23b6dbff3c168ba90294fe42f37fa35db556698

See more details on using hashes here.

Provenance

File details

Details for the file curies-0.5.2-py3-none-any.whl.

File metadata

  • Download URL: curies-0.5.2-py3-none-any.whl
  • Upload date:
  • Size: 25.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for curies-0.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b811c2c6a659cfafa66530d9ce5153b69800f78cd3c4eb57352a1f9c05e9fc8d
MD5 158b16ecd262f6c5aca40927b6ae33a0
BLAKE2b-256 7648fba5eb013026c47260133151d1c0a1e127305d871800d82eee61e5437e40

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page