This tool validates a data graph against a set of SHACL shape graphs that it extracts from a hierarchy of Profiles (Standards/Specifications and/or profiles of them).
Project description
CHEKA
A profile hierarchy-based RDF graph validation tool written in Python
This tool validates a data graph against a set of SHACL shape graphs that it extracts from a hierarchy of Profiles (Standards/Specifications and/or profiles of them). It uses conformance claims in the data graph to a Profile to collate and then use all the validator SHACL files within the hierarchy of other Profiles and Standards to which that Profile profiles.
Cheka uses Profiles Vocabulary (PROF) descriptions of Profiles and Standards and both
traverses up a Profile hierarchy (following prof:isProfileOf
properties) and across from prof:Profiles
s to
prof:ResourceDescriptors
that describe the constraints implemented for them. These constraints are currently limited
to Shapes Constraint Language (SHACL) files and must have the prof:Role
of
role:validation
to be recognised by Cheka. The pySHACL Python SHACL validator is
used to perform SHACL validation.
Installation
- Ensure Python 3 in available on your system
- Clone this repo
- Install requirements in requirements.txt, e.g.
~$ pip3 install -r requirements.txt
- Execute scripts as per Use below
Use
Input requirements
To use Cheka, you must supply it with both a data (an RDF graph) to be validated and a profiles hierarchy (another RDF graph). It will then use one of several selected strategies to validate objects within the data using validating resources it locates using the profiles hierarchy.
You may supply it with a couple of other flags too for other functions.
The command line arguments (Python & BASH) are:
Flag | Input values | Requirement | Notes |
---|---|---|---|
-d / --data |
an RDF file's path | mandatory | Can be in most RDF formats with conventional file endings (e.g. .ttl for Turtle, .jsonld for JSON-LD) |
-p / --profiles |
a profile file's path | mandatory | As above. Profiles description must be formulated according to PROF |
-s / --strategy |
'shacl' or 'profile' | optional, 'shacl' default | Which strategy to use. See Strategies description below |
-u / --profile-uri |
the URI of a profile in the profile hierarchy | sometimes mandatory | If strategy 'profile' is selected, a profile URI must be give. The data is then validated using validators within that profile's hierarchy only |
-r / --get-remotes |
none | optional, default False | If True, Cheka will pull in profile and validating SHACL artifacts referenced, but not described, in the profiles hierarchy, i.e. remote profiles online |
Data graph
This must be an RDF file with the part(s) to be validated indicating their conformance to a profile as per the Profiles Vocabulary.
Typically this will look like this:
@prefix dct: <http://purl.org/dc/terms/> .
<Object_X>
a <Class_Y> ;
dct:conformsTo <Profile_Z> ;
...
This says that <Object_X>
is meant to conform to <Profile_Z>
.
See the tests/
folder for example data graphs.
Profiles hierarchy
This must also be an RDF file that contains a hierarchy of prof:Profile
objects (including dct:Standard
objects)
that are related to one another via the prof:isProfileOf
property and each of which has a validating resource
indicated by relating it to a prof:Profile
via a prof:ResourceDescriptor
like this:
@prefix dct: <http://purl.org/dc/terms/> .
@prefix prof: <http://www.w3.org/ns/dx/prof/> .
@prefix role: <http://www.w3.org/ns/dx/prof/role/> .
<Standard_A>
a dct:Standard ;
prof:hasResource [
a prof:ResourceDescriptor ;
prof:hasRole role:validation ;
prof:hasArtifact <File_or_Uri_J> ;
]
.
<Profile_B>
a prof:Profile ;
prof:isProfileOf <Standard_A> ;
prof:hasResource <Resource_Descriptor_P> ;
.
<Resource_Descriptor_P>
a prof:ResourceDescriptor ;
prof:hasRole role:validation ;
prof:hasArtifact <File_or_Uri_K> ;
.
<Profile_C>
a prof:Profile ;
prof:isProfileOf <Profile_B> ;
prof:hasResource [
a prof:ResourceDescriptor ;
prof:hasRole role:validation ;
prof:hasArtifact <File_or_Uri_L> ;
] ;
.
This says <Profile_C>
is a profile of <Profile_B>
which is, in turn, a profile of <Standard_A>
. The two profiles
and the standard have resources <File_or_Uri_J>
, <File_or_Uri_K>
& <File_or_Uri_L>
respectively which are
indicated to be validators by the prof:ResourceDescriptor
classes that associate them with their profiles/standard.
See the tests/
folder for example profiles graphs.
Strategies
The following different strategies may be selected for use.
Name | Description |
---|---|
shacl | Standard SHACL validation: all the SHACL validators from all the profiles found in the profiles hierarchy are used to validate the the given data using the SHACL validators' targeting (usually per class) |
profile | Validates given data using the validators found linked to a profile and all the profiles in that profile's hierarchy. This is the "main" Cheka strategy, as opposed to shacl which is "normal" SHACL validation |
claims | Not implemented yet, likely February 2021 |
shacl is the default strategy
Note that the strategy is applied using the -s
flag. When using Cheka as a Python module, a different strategy may be applied per call to Cheka.validate()
.
Running
Cheka uses the profiles graph to find all the SHACL validators it needs to validate a data graph. It returns a pySHACL result with an additional element - the URI of the profile used for validation: [conforms, results_graph, results_text, profile_uri]. conforms is either True or False.
As a Python module
A Python program can import Cheka (import cheka
) after installing it (pip install cheka
). Then Cheka can be called in code like this:
import cheka
c = cheka.Cheka("data.ttl", "profiles_hierarchy.ttl")
# to tell Cheka to pull in profiles/validators
# referenced but not defined in the profiles_hierarchy.ttl
c.get_remote_profiles = True
# a simple validation - basic, default, shacl-only (no use of profiles)
c.validate()
# profile-based vaidation, starting with the profile Profile_C
c.validate(
strategy="profile",
profile_uri="http://example.org/profile/Profile_C"
)
As a Python command line utility
~$ python3 cli.py -d DATA-GRAPH-FILE -p PROFILES-GRAPH-FILE
(and potentially other optional args)
If you make the cli.py script executable (sudo chmod a+x cli.py
) then you can run it like this:
~$ ./cli.py -d DATA-GRAPH-FILE -p PROFILES-GRAPH-FILE
As a BASH script
The file cheka
in the bin/
directory is a BASH shell script that calls cli.py
. Make it executable
(sudo chmod a+x cheka
) then you can run it like this:
~$ ./cheka -d DATA-GRAPH-FILE -p PROFILES-GRAPH-FILE
(and potentially other optional args)
As a Windows executable
coming!
Testing
Tests are included in the tests/
directory. They use pytest should be able to be run from the command line. They have
no dependencies, other than pytest and Cheka itself.
Tests are annotated with what they are testing.
Test profile hierarchy
The profiles and validators used for the tests in this code are combined in the file test-profile.hierarchy.ttl. This hierarchy can be used in other applications as an example of a profile hierarchy.
License
This code is licensed using the GPL v3 licence. See the LICENSE file for the deed.
Note Citation below for attribution.
Citation
To cite this software, please use the following BibTex:
@software{10.5281/zenodo.3676330,
author = {{Nicholas J. Car}},
title = {Cheka: A profile hierarchy-based RDF graph validation tool written in Python},
version = {0.5},
date = {2020},
publisher = "SURROUND Australia Pty. Ltd.",
doi = {10.5281/zenodo.3676330},
url = {https://doi.org/10.5281/zenodo.3676330}
}
Or the following RDF:
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix sdo: <https://schema.org/> .
@prefix wiki: <https://www.wikidata.org/wiki/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<https://doi.org/10.5281/zenodo.3676330>
a sdo:SoftwareSourceCode , owl:NamedIndividual ;
sdo:codeRepository <https://github.com/surroundaustralia/cheka> ;
dcterms:type wiki:Q7397 ; # "software"
dcterms:creator "Nicholas J. Car" ;
dcterms:date "2020"^^xsd:gYear ;
dcterms:title "Cheka: A profile hierarchy-based RDF graph validation tool written in Python" ;
sdo:version "0.5" ;
dcterms:publisher [
a sdo:Organization ;
sdo:name "SURROUND Pty Ltd" ;
sdo:url <https://surroundaustralia.com> ;
]
.
Contacts
publisher:
SURROUND Australia Pty. Ltd.
https://surroundaustralia.com
creator:
Dr Nicholas J. Car
Data Systems Architect
SURROUND Australia Pty. Ltd.
nicholas.car@surroudaustralia.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file cheka-0.7.tar.gz
.
File metadata
- Download URL: cheka-0.7.tar.gz
- Upload date:
- Size: 15.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9e5dc67c04ab148a0cecc9b331bb08a2255ef8acfae7d486696c948607e94c29 |
|
MD5 | 58b6c8a99aa13119a21d6894d470361a |
|
BLAKE2b-256 | 7e61f63b998bfcf974f127c983d253acd884159fe4bf93325631af5b2a235f43 |
File details
Details for the file cheka-0.7-py2.py3-none-any.whl
.
File metadata
- Download URL: cheka-0.7-py2.py3-none-any.whl
- Upload date:
- Size: 23.6 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 71288a8bad3c31f11e09a40694e886832eddb15916592ef0365a7fa3a6c94b9c |
|
MD5 | f2f1f77338c9a713d5cb4a4b11846c6a |
|
BLAKE2b-256 | 068f5a97450d77cc022e40326729660588d3ed806cdbf42f0659d506988651ad |