Skip to main content

This tool validates a data graph against a set of SHACL shape graphs that it extracts from a hierarchy of Profiles (Standards/Specifications and/or profiles of them).

Project description

CHEKA

A profile hierarchy-based RDF graph validation tool written in Python

  1. Installation
  2. Use
  3. Testing
  4. License
  5. Citation
  6. Contacts

This tool validates a data graph against a set of SHACL shape graphs that it extracts from a hierarchy of Profiles (Standards/Specifications and/or profiles of them). It uses conformance claims in the data graph to a Profile to collate and then use all the validator SHACL files within the hierarchy of other Profiles and Standards to which that Profile profiles.

Cheka uses Profiles Vocabulary (PROF) descriptions of Profiles and Standards and both traverses up a Profile hierarchy (following prof:isProfileOf properties) and across from prof:Profiless to prof:ResourceDescriptors that describe the constraints implemented for them. These constraints are currently limited to Shapes Constraint Language (SHACL) files and must have the prof:Role of role:validation to be recognised by Cheka. The pySHACL Python SHACL validator is used to perform SHACL validation.

Installation

  1. Ensure Python 3 in available on your system
  2. Clone this repo
  3. Install requirements in requirements.txt, e.g. ~$ pip3 install -r requirements.txt
  4. Execute scripts as per Use below

Use

Input requirements

To use Cheka, you must supply it with both a data (an RDF graph) to be validated and a profiles hierarchy (another RDF graph). It will then use one of several selected strategies to validate objects within the data using validating resources it locates using the profiles hierarchy.

You may supply it with a couple of other flags too for other functions.

The command line arguments (Python & BASH) are:

Flag Input values Requirement Notes
-d / --data an RDF file's path mandatory Can be in most RDF formats with conventional file endings (e.g. .ttl for Turtle, .jsonld for JSON-LD)
-p / --profiles a profile file's path mandatory As above. Profiles description must be formulated according to PROF
-s / --strategy 'shacl' or 'profile' optional, 'shacl' default Which strategy to use. See Strategies description below
-u / --profile-uri the URI of a profile in the profile hierarchy sometimes mandatory If strategy 'profile' is selected, a profile URI must be give. The data is then validated using validators within that profile's hierarchy only
-r / --get-remotes none optional, default False If True, Cheka will pull in profile and validating SHACL artifacts referenced, but not described, in the profiles hierarchy, i.e. remote profiles online

Data graph

This must be an RDF file with the part(s) to be validated indicating their conformance to a profile as per the Profiles Vocabulary.

Typically this will look like this:

@prefix dct: <http://purl.org/dc/terms/> .

<Object_X>
    a <Class_Y> ;
    dct:conformsTo <Profile_Z> ;
    ...

This says that <Object_X> is meant to conform to <Profile_Z>.

See the tests/ folder for example data graphs.

Profiles hierarchy

This must also be an RDF file that contains a hierarchy of prof:Profile objects (including dct:Standard objects) that are related to one another via the prof:isProfileOf property and each of which has a validating resource indicated by relating it to a prof:Profile via a prof:ResourceDescriptor like this:

@prefix dct: <http://purl.org/dc/terms/> .
@prefix prof: <http://www.w3.org/ns/dx/prof/> .
@prefix role:  <http://www.w3.org/ns/dx/prof/role/> .


<Standard_A>
    a dct:Standard ;
    prof:hasResource [
        a prof:ResourceDescriptor ;
        prof:hasRole role:validation ;
        prof:hasArtifact <File_or_Uri_J> ;
    ]
.

<Profile_B>
    a prof:Profile ;
    prof:isProfileOf <Standard_A> ;
    prof:hasResource <Resource_Descriptor_P> ;
.   

<Resource_Descriptor_P>
    a prof:ResourceDescriptor ;
    prof:hasRole role:validation ;
    prof:hasArtifact <File_or_Uri_K> ;
.

<Profile_C>
    a prof:Profile ; 
    prof:isProfileOf <Profile_B> ;    
    prof:hasResource [
        a prof:ResourceDescriptor ;
        prof:hasRole role:validation ;
        prof:hasArtifact <File_or_Uri_L> ;
    ] ;
.

This says <Profile_C> is a profile of <Profile_B> which is, in turn, a profile of <Standard_A>. The two profiles and the standard have resources <File_or_Uri_J>, <File_or_Uri_K> & <File_or_Uri_L> respectively which are indicated to be validators by the prof:ResourceDescriptor classes that associate them with their profiles/standard.

See the tests/ folder for example profiles graphs.

Strategies

The following different strategies may be selected for use.

Name Description
shacl Standard SHACL validation: all the SHACL validators from all the profiles found in the profiles hierarchy are used to validate the the given data using the SHACL validators' targeting (usually per class)
profile Validates given data using the validators found linked to a profile and all the profiles in that profile's hierarchy. This is the "main" Cheka strategy, as opposed to shacl which is "normal" SHACL validation
claims Not implemented yet, likely February 2021

shacl is the default strategy

Note that the strategy is applied using the -s flag. When using Cheka as a Python module, a different strategy may be applied per call to Cheka.validate().

Running

Cheka uses the profiles graph to find all the SHACL validators it needs to validate a data graph. It returns a pySHACL result with an additional element - the URI of the profile used for validation: [conforms, results_graph, results_text, profile_uri]. conforms is either True or False.

As a Python module

A Python program can import Cheka (import cheka) after installing it (pip install cheka). Then Cheka can be called in code like this:

import cheka

c = cheka.Cheka("data.ttl", "profiles_hierarchy.ttl")

# to tell Cheka to pull in profiles/validators 
# referenced but not defined in the profiles_hierarchy.ttl
c.get_remote_profiles = True  

# a simple validation - basic, default, shacl-only (no use of profiles)
c.validate()

# profile-based vaidation, starting with the profile Profile_C 
c.validate(
    strategy="profile", 
    profile_uri="http://example.org/profile/Profile_C"
)

As a Python command line utility

~$ python3 cli.py -d DATA-GRAPH-FILE -p PROFILES-GRAPH-FILE

(and potentially other optional args)

If you make the cli.py script executable (sudo chmod a+x cli.py) then you can run it like this:

~$ ./cli.py -d DATA-GRAPH-FILE -p PROFILES-GRAPH-FILE

As a BASH script

The file cheka in the bin/ directory is a BASH shell script that calls cli.py. Make it executable (sudo chmod a+x cheka) then you can run it like this:

~$ ./cheka -d DATA-GRAPH-FILE -p PROFILES-GRAPH-FILE

(and potentially other optional args)

As a Windows executable

coming!

Testing

Tests are included in the tests/ directory. They use pytest should be able to be run from the command line. They have no dependencies, other than pytest and Cheka itself.

Tests are annotated with what they are testing.

Test profile hierarchy

The profiles and validators used for the tests in this code are combined in the file test-profile.hierarchy.ttl. This hierarchy can be used in other applications as an example of a profile hierarchy.

License

This code is licensed using the GPL v3 licence. See the LICENSE file for the deed.

Note Citation below for attribution.

Citation

To cite this software, please use the following BibTex:

@software{10.5281/zenodo.3676330,
  author = {{Nicholas J. Car}},
  title = {Cheka: A profile hierarchy-based RDF graph validation tool written in Python},
  version = {0.5},
  date = {2020},
  publisher = "SURROUND Australia Pty. Ltd.",
  doi = {10.5281/zenodo.3676330},
  url = {https://doi.org/10.5281/zenodo.3676330}
}

Or the following RDF:

@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix sdo: <https://schema.org/> .
@prefix wiki: <https://www.wikidata.org/wiki/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<https://doi.org/10.5281/zenodo.3676330>
    a sdo:SoftwareSourceCode , owl:NamedIndividual ;
    sdo:codeRepository <https://github.com/surroundaustralia/cheka> ;
    dcterms:type wiki:Q7397 ; # "software"
    dcterms:creator "Nicholas J. Car" ;
    dcterms:date "2020"^^xsd:gYear ;
    dcterms:title "Cheka: A profile hierarchy-based RDF graph validation tool written in Python" ;
    sdo:version "0.5" ;
    dcterms:publisher [
        a sdo:Organization ;
        sdo:name "SURROUND Pty Ltd" ;
        sdo:url <https://surroundaustralia.com> ;
    ]
.

Contacts

publisher:

SURROUND Australia Pty. Ltd.
https://surroundaustralia.com

creator:
Dr Nicholas J. Car
Data Systems Architect
SURROUND Australia Pty. Ltd.
nicholas.car@surroudaustralia.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cheka-0.7.tar.gz (15.2 kB view details)

Uploaded Source

Built Distribution

cheka-0.7-py2.py3-none-any.whl (23.6 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file cheka-0.7.tar.gz.

File metadata

  • Download URL: cheka-0.7.tar.gz
  • Upload date:
  • Size: 15.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.7.3

File hashes

Hashes for cheka-0.7.tar.gz
Algorithm Hash digest
SHA256 9e5dc67c04ab148a0cecc9b331bb08a2255ef8acfae7d486696c948607e94c29
MD5 58b6c8a99aa13119a21d6894d470361a
BLAKE2b-256 7e61f63b998bfcf974f127c983d253acd884159fe4bf93325631af5b2a235f43

See more details on using hashes here.

File details

Details for the file cheka-0.7-py2.py3-none-any.whl.

File metadata

  • Download URL: cheka-0.7-py2.py3-none-any.whl
  • Upload date:
  • Size: 23.6 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.7.3

File hashes

Hashes for cheka-0.7-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 71288a8bad3c31f11e09a40694e886832eddb15916592ef0365a7fa3a6c94b9c
MD5 f2f1f77338c9a713d5cb4a4b11846c6a
BLAKE2b-256 068f5a97450d77cc022e40326729660588d3ed806cdbf42f0659d506988651ad

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page