Skip to main content

Collate textual sources with relaxed spelling.

Project description

py39 py310 py311 pypy38 coverage

Collates textual sources with relaxed spelling. Uses Gotoh’s variant of the Needleman-Wunsch sequence alignment algorithm.

$ pip install super-collator
>>> from super_collator.aligner import Aligner
>>> from super_collator.ngrams import NGrams
>>> from super_collator.super_collator import to_table

>>> aligner = Aligner(-0.5, -0.5, -0.5)
>>> a = "Lorem ipsum dollar amat adipiscing elit"
>>> b = "qui dolorem ipsum quia dolor sit amet consectetur adipisci velit"
>>>
>>> a = [NGrams(s).load(s, 3) for s in a.split()]
>>> b = [NGrams(s).load(s, 3) for s in b.split()]
>>>
>>> a, b, score = aligner.align(a, b, NGrams.similarity, lambda: NGrams("-"))
>>> print(to_table(list(map(str, a)), list(map(str, b))))  # doctest: +NORMALIZE_WHITESPACE
-   Lorem   ipsum -    dollar -   amat -           adipiscing elit
qui dolorem ipsum quia dolor  sit amet consectetur adipisci   velit

Documentation: https://cceh.github.io/super-collator/

PyPi: https://pypi-hypernode.com/project/super-collator/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

super_collator-0.0.3.tar.gz (3.0 MB view details)

Uploaded Source

Built Distribution

super_collator-0.0.3-py3-none-any.whl (18.3 kB view details)

Uploaded Python 3

File details

Details for the file super_collator-0.0.3.tar.gz.

File metadata

  • Download URL: super_collator-0.0.3.tar.gz
  • Upload date:
  • Size: 3.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.8

File hashes

Hashes for super_collator-0.0.3.tar.gz
Algorithm Hash digest
SHA256 8e3fa0b535036bdc5686a1914da3e6c8882b6b9750dfcefc504e12bdf61b4e45
MD5 d9212cfef5379a8c1a35c2b2ffc7dc7f
BLAKE2b-256 6b1604ecfbfd82a0cf31b8469d1661348a17e48972bfd8dcd785856b5f917959

See more details on using hashes here.

File details

Details for the file super_collator-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for super_collator-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 301719724a4908059576151c8fb2ccf4961e02b11a360ef938c0217c8536ddb3
MD5 faf747effb41a256ce54b8750e990d87
BLAKE2b-256 7bd9406191e50d9fbffb738b1870e151eb0f48998077f013ad707126c5cdde06

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page