Skip to main content

Command-line program that scans the NMDC MongoDB database for referential integrity violations

Project description

refscan

refscan is a command-line tool people can use to scan the NMDC MongoDB database for referential integrity violations.

%% This is the source code of a Mermaid diagram, which GitHub will render as a diagram.
%% Note: PyPI does not render Mermaid diagrams, and instead displays their source code.
%%       Reference: https://github.com/pypi/warehouse/issues/13083
graph LR
    schema[LinkML<br>schema]
    database[(MongoDB<br>database)]
    script[["refscan.py"]]
    violations["List of<br>violations"]
    references["List of<br>references"]:::dashed_border
    schema --> script
    database --> script
    script -.-> references
    script --> violations
    
    classDef dashed_border stroke-dasharray: 5 5

Assumptions

refscan was designed under some assumptions about the schema and database, including:

  1. Each source document (i.e. document containing references) has a field named type, whose value (a string) is the class_uri of the schema class of which the document represents an instance. For example, the type field of each document in the study_set collection has the value "nmdc:Study".

Development status

refscan is in early development and its author does not recommend anyone use it for anything without reviewing its code first.

Tips

refscan requires the user to specify the path to a schema in YAML format. If you have curl installed, you can download a YAML file from GitHub by running the following command (after replacing the {...} placeholders and customizing the path):

# Download the raw content of https://github.com/{user_or_org}/{repo}/blob/{branch}/path/to/schema.yaml
curl -o schema.yaml https://raw.githubusercontent.com/{user_or_org}/{repo}/{branch}/path/to/schema.yaml

For example:

# Download the raw content of https://github.com/microbiomedata/berkeley-schema-fy24/blob/main/nmdc_schema/nmdc_materialized_patterns.yaml
curl -o schema.yaml https://raw.githubusercontent.com/microbiomedata/berkeley-schema-fy24/main/nmdc_schema/nmdc_materialized_patterns.yaml

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

refscan-0.1.0.tar.gz (11.6 kB view details)

Uploaded Source

Built Distribution

refscan-0.1.0-py3-none-any.whl (13.9 kB view details)

Uploaded Python 3

File details

Details for the file refscan-0.1.0.tar.gz.

File metadata

  • Download URL: refscan-0.1.0.tar.gz
  • Upload date:
  • Size: 11.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for refscan-0.1.0.tar.gz
Algorithm Hash digest
SHA256 44c9f5b7b94be478993615ce692492ecf2a4bb5014d00f02a58497bcf39c0a00
MD5 fdd845c7a4e80603dfb228e80f869743
BLAKE2b-256 3718a7e5fe819de847755ae864d4e3840ea68801ac2d1d14df2e7c6d00a33cc8

See more details on using hashes here.

File details

Details for the file refscan-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: refscan-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 13.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for refscan-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2de5b20b9d01a3f8ac8ff81795333f19147178f06e1de6c0ca4fa0f825ca4553
MD5 674b426f790fba248c26013370c08485
BLAKE2b-256 303211676fcb3ba39cfce676193c0952873d11b38ff148c5be4666a932090f37

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page