Skip to main content

JSON schema and validation code for HEPData submissions

Project description

Travis Status Coveralls Status License GitHub Releases PyPI Version GitHub Issues Documentation Status

JSON schema and validation code for HEPData submissions

Installation

If you can, install LibYAML (a C library for parsing and emitting YAML) on your machine. This will allow for the use of CLoader for faster loading of YAML files. Not a big deal for small files, but performs markedly better on larger documents.

Via pip:

pip install hepdata-validator

Via GitHub (for developers):

git clone https://github.com/HEPData/hepdata-validator
cd hepdata-validator
pip install --upgrade -e .[tests]
pytest testsuite

Usage

To validate submission files, instantiate a SubmissionFileValidator object:

from hepdata_validator.submission_file_validator import SubmissionFileValidator

submission_file_validator = SubmissionFileValidator()
submission_file_path = 'submission.yaml'

# the validate method takes a string representing the file path
is_valid_submission_file = submission_file_validator.validate(file_path=submission_file_path)

# if there are any error messages, they are retrievable through this call
submission_file_validator.get_messages()

# the error messages can be printed
submission_file_validator.print_errors(submission_file_path)

To validate data files, instantiate a DataFileValidator object:

from hepdata_validator.data_file_validator import DataFileValidator

data_file_validator = DataFileValidator()

# the validate method takes a string representing the file path
data_file_validator.validate(file_path='data.yaml')

# if there are any error messages, they are retrievable through this call
data_file_validator.get_messages()

# the error messages can be printed
data_file_validator.print_errors('data.yaml')

Optionally, if you have already loaded the YAML object, then you can pass it through as a data object. You must also pass through the file_path since this is used as a key for the error message lookup map.

from hepdata_validator.data_file_validator import DataFileValidator
import yaml

file_contents = yaml.safe_load(open('data.yaml', 'r'))
data_file_validator = DataFileValidator()

data_file_validator.validate(file_path='data.yaml', data=file_contents)

data_file_validator.get_messages('data.yaml')

data_file_validator.print_errors('data.yaml')

For the analogous case of the SubmissionFileValidator:

from hepdata_validator.submission_file_validator import SubmissionFileValidator
import yaml
submission_file_path = 'submission.yaml'

# convert a generator returned by yaml.safe_load_all into a list
docs = list(yaml.safe_load_all(open(submission_file_path, 'r')))

submission_file_validator = SubmissionFileValidator()
is_valid_submission_file = submission_file_validator.validate(file_path=submission_file_path, data=docs)
submission_file_validator.print_errors(submission_file_path)

An example offline validation script uses the hepdata_validator package to validate the submission.yaml file and all YAML data files of a HEPData submission.

Schema Versions

When considering native HEPData JSON schemas, there are multiple versions. In most cases you should use the latest version (the default). If you need to use a different version, you can pass a keyword argument schema_version when initialising the validator:

submission_file_validator = SubmissionFileValidator(schema_version='0.1.0')
data_file_validator = DataFileValidator(schema_version='0.1.0')

Remote Schemas

When using remotely defined schemas, versions depend on the organization providing those schemas, and it is their responsibility to offer a way of keeping track of different schema versions.

The JsonSchemaResolver object resolves $ref in the JSON schema. The HTTPSchemaDownloader object retrieves schemas from a remote location, and optionally saves them in the local file system, following the structure: schemas_remote/<org>/<project>/<version>/<schema_name>. An example may be:

from hepdata_validator.data_file_validator import DataFileValidator
data_validator = DataFileValidator()

# Split remote schema path and schema name
schema_path = 'https://scikit-hep.org/pyhf/schemas/1.0.0/'
schema_name = 'workspace.json'

# Create JsonSchemaResolver object to resolve $ref in JSON schema
from hepdata_validator.schema_resolver import JsonSchemaResolver
pyhf_resolver = JsonSchemaResolver(schema_path)

# Create HTTPSchemaDownloader object to validate against remote schema
from hepdata_validator.schema_downloader import HTTPSchemaDownloader
pyhf_downloader = HTTPSchemaDownloader(pyhf_resolver, schema_path)

# Retrieve and save the remote schema in the local path
pyhf_type = pyhf_downloader.get_schema_type(schema_name)
pyhf_spec = pyhf_downloader.get_schema_spec(schema_name)
pyhf_downloader.save_locally(schema_name, pyhf_spec)

# Load the custom schema as a custom type
import os
pyhf_path = os.path.join(pyhf_downloader.schemas_path, schema_name)
data_validator.load_custom_schema(pyhf_type, pyhf_path)

# Validate a specific schema instance
data_validator.validate(file_path='pyhf_workspace.json', file_type=pyhf_type)

The native HEPData JSON schema are provided as part of the hepdata-validator package and it is not necessary to download them. However, in principle, for testing purposes, note that the same mechanism above could be used with:

schema_path = 'https://hepdata.net/submission/schemas/1.0.1/'
schema_name = 'data_schema.json'

and passing a HEPData YAML data file as the file_path argument of the validate method.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hepdata_validator-0.2.3.tar.gz (14.2 kB view details)

Uploaded Source

Built Distribution

hepdata_validator-0.2.3-py2.py3-none-any.whl (29.4 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file hepdata_validator-0.2.3.tar.gz.

File metadata

  • Download URL: hepdata_validator-0.2.3.tar.gz
  • Upload date:
  • Size: 14.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/44.1.1 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/2.7.15

File hashes

Hashes for hepdata_validator-0.2.3.tar.gz
Algorithm Hash digest
SHA256 314e75eae7d4a134bfc8291440259839d82aabefdd720f237c0bf8ea5c9be4dc
MD5 1a250d00b901c3fb58a3bd6677c95869
BLAKE2b-256 4149ac1f4e8687817db01e9449b1991f261946ed22fba7fa46b40b23c76439e2

See more details on using hashes here.

File details

Details for the file hepdata_validator-0.2.3-py2.py3-none-any.whl.

File metadata

  • Download URL: hepdata_validator-0.2.3-py2.py3-none-any.whl
  • Upload date:
  • Size: 29.4 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/44.1.1 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/2.7.15

File hashes

Hashes for hepdata_validator-0.2.3-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 e7e39e92f68536d6749319eb3cec5aec34338237fc01e0fdbf92ea17f87113cf
MD5 4f99e296b1520f6dc69db0514fd31149
BLAKE2b-256 c08a33c69f10e8def5dcfed3ed4e7f1a912cefba14a688241a1ed4a860fea274

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page