Skip to main content

Collate textual sources with relaxed spelling.

Project description

py39 py310 py311 py312 pypy39 coverage

Collates textual sources with relaxed spelling. Uses Gotoh’s variant of the Needleman-Wunsch sequence alignment algorithm.

$ pip install super-collator
>>> from super_collator.aligner import Aligner
>>> from super_collator.ngrams import NGrams
>>> from super_collator.super_collator import to_table

>>> aligner = Aligner(-0.5, -0.5, -0.5)
>>> a = "Lorem ipsum dollar amat adipiscing elit"
>>> b = "qui dolorem ipsum quia dolor sit amet consectetur adipisci velit"
>>>
>>> a = [NGrams(s).load(s, 3) for s in a.split()]
>>> b = [NGrams(s).load(s, 3) for s in b.split()]
>>>
>>> a, b, score = aligner.align(a, b, NGrams.similarity, lambda: NGrams("-"))
>>> print(to_table(list(map(str, a)), list(map(str, b))))  # doctest: +NORMALIZE_WHITESPACE
-   Lorem   ipsum -    dollar -   amat -           adipiscing elit
qui dolorem ipsum quia dolor  sit amet consectetur adipisci   velit

Documentation: https://cceh.github.io/super-collator/

PyPi: https://pypi-hypernode.com/project/super-collator/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

super_collator-0.0.4.tar.gz (3.0 MB view details)

Uploaded Source

Built Distribution

super_collator-0.0.4-py3-none-any.whl (19.4 kB view details)

Uploaded Python 3

File details

Details for the file super_collator-0.0.4.tar.gz.

File metadata

  • Download URL: super_collator-0.0.4.tar.gz
  • Upload date:
  • Size: 3.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for super_collator-0.0.4.tar.gz
Algorithm Hash digest
SHA256 4088c3013a8f324182572dd26aa2a80265a17d744c4729a26d430168db0cdf09
MD5 c5174abff1ec7b74974f5e5f216ba76a
BLAKE2b-256 25aa2d09bdd0e305568a294fa01b78967bbc6c80d17edcb2338f79c074bcf9e3

See more details on using hashes here.

File details

Details for the file super_collator-0.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for super_collator-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 937a417a752f888ee9287a782ae68588facb10264496ef03cdc582dc8dacc5ff
MD5 97ca52c88b7aef1a9788c65d2608504b
BLAKE2b-256 2e27c74790eb38047dfdecf407aadfcd55b0ee36418a9f3c14492a6d49c76056

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page