toolkit that assists with generating LinkML schemas from existing serializations like JSON-schema
Project description
LinkML Schema Automator
This is a toolkit that assists with:
- Bootstrapping LinkML models from instance data
- TSVs and spreadsheets
- SQLite databases
- RDF instance graphs
- Bootstrapping a LinkML model from a different schema representation (i.e. opposite of a linkml.generator)
- OWL (RDFS-like subset)
- TODO: JSON-Schema, XSD, ShEx, SHACL, SQL DDL, FHIR, Python dataclasses/pydantic, etc
- Using automated methods to enhance a model
- Using text mining and concept annotator APIs to enrich semantic enums
- TODO: querying sparql endpoints to retrieve additional metadata
These can be composed together. For example, run tsvs2linkml
followed by annotate-enums
The toolkit is still experimental. It is intended as an aid to schema creation rather than act as a formal conversion tool
Installation
linkml-model-enrichment
and its components require Python 3.9 or greater.
chmod 755 environment.sh
. environment.sh
pip install -r requirements.txt
pip install -e .
Command Line Usage
Annotating Enums
This toolkit allows automated annotation of LinkML enums, mapping text strings to ontology terms.
The command line tool annotate-enums
takes a LinkML schema, with enums and fills in the meaning
slots.
See the annotators folder for docs
Converting TSVs
The tsv2linkml
command infers a single-class schema from a TSV datafile
$ tsv2linkml --help
Usage: tsv2linkml [OPTIONS] TSVFILE
Infer a model from a TSV
Options:
-o, --output TEXT Output file
-c, --class_name TEXT Core class name in schema
-n, --schema_name TEXT Schema name
-s, --sep TEXT separator
-E, --enum-columns TEXT column that is forced to be an enum
--robot / --no-robot set if the TSV is a ROBOT template
--help Show this message and exit.
Example:
tsv2linkml tests/resources/biobank-specimens.tsv
The tsvs2linkml
command infers a multi-class schema from multiple TSV datafiles
$ tsvs2linkml --help
Usage: tsvs2linkml [OPTIONS] [TSVFILES]...
Infer a model from multiple TSVs
Options:
-o, --output TEXT Output file
-n, --schema_name TEXT Schema name
-s, --sep TEXT separator
-E, --enum-columns TEXT column(s) that is forced to be an enum
--enum-mask-columns TEXT column(s) that are excluded from being enums
--max-enum-size INTEGER do not create an enum if more than max distinct
members
--enum-threshold FLOAT if the number of distinct values / rows is less
than this, do not make an enum
--robot / --no-robot set if the TSV is a ROBOT template
--help Show this message and exit.
Converting OWL
$ owl2linkml --help
Usage: owl2linkml [OPTIONS] OWLFILE
Infer a model from OWL Ontology
Note: input must be in functional syntax
Options:
-n, --name TEXT Schema name
--help Show this message and exit.
Example:
owl2linkml -n prov tests/resources/prov.ofn > prov.yaml
Note this works best on schema-style ontologies such as Prov
NOT recommended for terminological-style ontologies such as OBO
Converting RDF instance graphs
$ rdf2linkml --help
Usage: rdf2linkml [OPTIONS] RDFFILE
Infer a model from RDF instance data
Options:
-d, --dir TEXT [required]
--help Show this message and exit.
Converting JSON Instance Data
$ jsondata2linkml --help
Usage: jsondata2linkml [OPTIONS] INPUT
Infer a model from JSON instance data
Options:
--container-class-name TEXT name of root class
-f, --format TEXT json or yaml (or json.gz or yaml.gz)
--omit-null / --no-omit-null if true, ignore null values
--help Show this message and exit.
Converting JSON-Schema
$ jsonschema2linkml --help
Usage: jsonschema2linkml [OPTIONS] INPUT
Infer a model from JSON Schema
Options:
-n, --name TEXT ID of schema [required]
-f, --format TEXT JSON Schema format - yaml or json
-o, --output TEXT output path
--help Show this message and exit.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for schema_automator-1.0.0b0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5e116b55e06085e3a440ec96ba83bf4d1639ef13876beb4f2bc925aae8be6bcc |
|
MD5 | a52da91a5bafe9415a65469b8f2a5361 |
|
BLAKE2b-256 | 899ea32a4df9926dc5a9f0e740b640384b55b04ccd3c9e95cec9ec761150cb1b |