Skip to main content

pluggable command-line tool for validating the formatting and orthography of text files

Project description

text-validator

pluggable command-line tool for validating the formatting and orthography of text files

You config your validator plugins with a TOML file like:

["text_validator.plugins.whitespace"]
CHECK_CRLF = true
CHECK_TABS = true
CHECK_TRAILING_WHITESPACE = true
CHECK_NO_EOF_NEWLINE = true

["text_validator.plugins.unicode"]
CONFIRM_UTF_8_NFC = true

["text_validator.plugins.ref_line_format"]
REF_REGEX = "(\\d+|EP|SB)\\.\\d+(\\.\\d+)?$"  # example from AF

["text_validator.plugins.characters"]
REPLACE_CHARS = [
    # bad character, suggested replacement
    ["\u02BC", "\u2019"],
    ["\u1FBF", "\u2019"],
    ["\u037E", "\u003B"],
    ["\u0387", "\u00B7"],
    ["\u0374", "\u02B9"],
    ["\u03D5", "\u03C6"],
    ["\u03D1", "\u03B8"],
]

and they'll validate the texts you give it:

tests/test_0001.txt:1:line ends with CRLF
tests/test_0001.txt:2:line ends with CRLF
tests/test_0002.txt:1:no newline at end of file
tests/test_0003.txt:1:line contains a tab
tests/test_0004.txt:1:trailing whitespace
tests/test_0006.txt:1:not NFC
tests/test_0007.txt:2:BLANK LINE
tests/test_0008.txt:1:BAD WHITESPACE
tests/test_0008.txt:2:BAD WHITESPACE
tests/test_0009.txt:4:BAD REFERENCE FORM
tests/test_0009.txt:5:BAD REFERENCE FORM
tests/test_0010.txt:2:29:bad U+02BC; consider replacing with U+2019
tests/test_0010.txt:3:29:bad U+1FBF; consider replacing with U+2019

To install:

pip install text-validator

Then you can either run from the command line:

validate-text tests/config_004.toml tests/test_0007.txt tests/test_0008.txt tests/test_0009.txt

or programmatically from Python, either with the helper function validate:

from text_validator.main import validate

validate("tests/config_003.toml", ["tests/test_0005.txt", "tests/test_0006.txt"])

or by working directly with a Suite instance:

from text_validator.base import Suite

suite = Suite()
suite.load_toml("tests/config_002.toml")
suite.validate_files(["tests/test_0005.txt", "tests/test_0006.txt"])

Also see:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

text-validator-0.2.tar.gz (5.0 kB view details)

Uploaded Source

Built Distribution

text_validator-0.2-py3-none-any.whl (6.8 kB view details)

Uploaded Python 3

File details

Details for the file text-validator-0.2.tar.gz.

File metadata

  • Download URL: text-validator-0.2.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.39.0 CPython/3.8.0

File hashes

Hashes for text-validator-0.2.tar.gz
Algorithm Hash digest
SHA256 ac3d87a5bbe2fab5ab3fd37c75ef51605d6f18e7f5bd976154d5beaef47ea9df
MD5 987d20468cfc984c88c500eb51d8c3e3
BLAKE2b-256 db44a7a5f988243eb23e676d9bfcba686455e23f8b0659a33e083b69972e7579

See more details on using hashes here.

File details

Details for the file text_validator-0.2-py3-none-any.whl.

File metadata

  • Download URL: text_validator-0.2-py3-none-any.whl
  • Upload date:
  • Size: 6.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.39.0 CPython/3.8.0

File hashes

Hashes for text_validator-0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 afb37ced4f2d6bdc3dac95203edc72f4c1b055dbf788e9269cfde3d2dfb5e32d
MD5 32605a960e82376aa9bb5b68a7a6faa3
BLAKE2b-256 b1ce1d35c0e3efc4b12cfeaeab2acb510c763350012c69af4159cd05d016a0be

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page