Skip to main content

Collate textual sources with relaxed spelling.

Project description

py39 py310 py311 pypy38 coverage

Collates textual sources with relaxed spelling. Uses Gotoh’s variant of the Needleman-Wunsch sequence alignment algorithm.

$ pip install super-collator
>>> from super_collator.strategy import CommonNgramsStrategy
>>> from super_collator.token import SingleToken
>>> from super_collator.super_collator import align, to_table

>>> a = "Lorem ipsum dollar amat adipiscing elit"
>>> b = "qui dolorem ipsum quia dolor sit amet consectetur adipisci velit"
>>>
>>> a = [SingleToken(s) for s in a.split()]
>>> b = [SingleToken(s) for s in b.split()]
>>>
>>> c, score = align(a, b, CommonNgramsStrategy(2))
>>> print(to_table(c))  # doctest: +NORMALIZE_WHITESPACE
-   Lorem   ipsum -    dollar -   amat -           adipiscing elit
qui dolorem ipsum quia dolor  sit amet consectetur adipisci   velit

Documentation: https://cceh.github.io/super-collator/

PyPi: https://pypi-hypernode.com/project/super-collator/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

super_collator-0.0.2.tar.gz (3.0 MB view details)

Uploaded Source

Built Distribution

super_collator-0.0.2-py3-none-any.whl (20.0 kB view details)

Uploaded Python 3

File details

Details for the file super_collator-0.0.2.tar.gz.

File metadata

  • Download URL: super_collator-0.0.2.tar.gz
  • Upload date:
  • Size: 3.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.7

File hashes

Hashes for super_collator-0.0.2.tar.gz
Algorithm Hash digest
SHA256 ebab6189dfc57bb13a73d06e628dce048f3a9213bb0ed7ecc22732655e22510b
MD5 bc48ba9ad19e105c6196430481b3fe1b
BLAKE2b-256 68d18b29dc7cc1189851976b1b78edea980a8691fcb0fb5238752a404ff5cb95

See more details on using hashes here.

File details

Details for the file super_collator-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for super_collator-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8aa8b3507fee47b0ac0c080beff824c0ebe3a50d3219e8759ecc2d32418fc485
MD5 bfeb6aaf1a7a2435ad5ef2a5237b3937
BLAKE2b-256 e1834d7f9d0ea3e99cdd47203f3d20108435c39bea5c3071089c18f9716768a4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page