Skip to main content

pluggable command-line tool for validating the formatting and orthography of text files

Project description

text-validator

pluggable command-line tool for validating the formatting and orthography of text files

You config your validator plugins with a TOML file like:

["text_validator.plugins.whitespace"]
CHECK_CRLF = true
CHECK_TABS = true
CHECK_TRAILING_WHITESPACE = true
CHECK_NO_EOF_NEWLINE = true

["text_validator.plugins.unicode"]
CONFIRM_UTF_8_NFC = true

["text_validator.plugins.ref_line_format"]
REF_REGEX = "(\\d+|EP|SB)\\.\\d+(\\.\\d+)?$"  # example from AF

["text_validator.plugins.characters"]
REPLACE_CHARS = [
    # bad character, suggested replacement
    ["\u02BC", "\u2019"],
    ["\u1FBF", "\u2019"],
    ["\u037E", "\u003B"],
    ["\u0387", "\u00B7"],
    ["\u0374", "\u02B9"],
    ["\u03D5", "\u03C6"],
    ["\u03D1", "\u03B8"],
]

and they'll validate the texts you give it:

tests/test_0001.txt:1:line ends with CRLF
tests/test_0001.txt:2:line ends with CRLF
tests/test_0002.txt:1:no newline at end of file
tests/test_0003.txt:1:line contains a tab
tests/test_0004.txt:1:trailing whitespace
tests/test_0006.txt:1:not NFC
tests/test_0007.txt:2:BLANK LINE
tests/test_0008.txt:1:BAD WHITESPACE
tests/test_0008.txt:2:BAD WHITESPACE
tests/test_0009.txt:4:BAD REFERENCE FORM
tests/test_0009.txt:5:BAD REFERENCE FORM
tests/test_0010.txt:2:29:bad U+02BC; consider replacing with U+2019
tests/test_0010.txt:3:29:bad U+1FBF; consider replacing with U+2019

You can either run from the command line:

validate-text tests/config_004.toml tests/test_0007.txt tests/test_0008.txt tests/test_0009.txt

or programmatically from Python, either with the helper function validate:

from text_validator.main import validate

validate("tests/config_003.toml", ["tests/test_0005.txt", "tests/test_0006.txt"])

or by working directly with a Suite instance:

from text_validator.base import Suite

suite = Suite()
suite.load_toml("tests/config_002.toml")
suite.validate_files(["tests/test_0005.txt", "tests/test_0006.txt"])

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

text-validator-0.1.tar.gz (4.3 kB view details)

Uploaded Source

Built Distribution

text_validator-0.1-py3-none-any.whl (6.5 kB view details)

Uploaded Python 3

File details

Details for the file text-validator-0.1.tar.gz.

File metadata

  • Download URL: text-validator-0.1.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.8.0

File hashes

Hashes for text-validator-0.1.tar.gz
Algorithm Hash digest
SHA256 15110a8ee7e90953b034ac0d6d5000fb73227d02cbd14e05f881f76ac82a5b2f
MD5 a62e4d38e71966ef6ca11d9f39aed2e0
BLAKE2b-256 5fc2786db38977446d8fc16479571485c378b1f99764a8e5f6e518cd0f0be266

See more details on using hashes here.

File details

Details for the file text_validator-0.1-py3-none-any.whl.

File metadata

  • Download URL: text_validator-0.1-py3-none-any.whl
  • Upload date:
  • Size: 6.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.8.0

File hashes

Hashes for text_validator-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b233db6f365f24670d7cf3235b1033ebc9d902385629ad392e80a8c4e5ebc310
MD5 ab9cc782de22ba9ce1b06b1beab66850
BLAKE2b-256 df5f7a9490cf908f828d45e1db6d7c26a9fe86709a2a42ae472ab3ad86085f33

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page