Collate textual sources with relaxed spelling.
Project description
Collates textual sources with relaxed spelling. Uses Gotoh’s variant of the Needleman-Wunsch sequence alignment algorithm.
$ pip install super-collator
>>> from super_collator.aligner import Aligner
>>> from super_collator.ngrams import NGrams
>>> from super_collator.super_collator import to_table
>>> aligner = Aligner(-0.5, -0.5, -0.5)
>>> a = "Lorem ipsum dollar amat adipiscing elit"
>>> b = "qui dolorem ipsum quia dolor sit amet consectetur adipisci velit"
>>>
>>> a = [NGrams(s).load(s, 3) for s in a.split()]
>>> b = [NGrams(s).load(s, 3) for s in b.split()]
>>>
>>> a, b, score = aligner.align(a, b, NGrams.similarity, lambda: NGrams("-"))
>>> print(to_table(list(map(str, a)), list(map(str, b)))) # doctest: +NORMALIZE_WHITESPACE
- Lorem ipsum - dollar - amat - adipiscing elit
qui dolorem ipsum quia dolor sit amet consectetur adipisci velit
Documentation: https://cceh.github.io/super-collator/
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
super_collator-0.0.5.tar.gz
(37.1 kB
view details)
Built Distribution
File details
Details for the file super_collator-0.0.5.tar.gz
.
File metadata
- Download URL: super_collator-0.0.5.tar.gz
- Upload date:
- Size: 37.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e0306f48131d70ca7e26dff1b022a2737c0769352b0024e51a8e5c89d333b651 |
|
MD5 | b0f2e0d8a278f374fb67453da7320291 |
|
BLAKE2b-256 | 817963e2dc885651154f9ecf165400655e749a3d9a208bd5fea006acdf74b0a3 |
File details
Details for the file super_collator-0.0.5-py3-none-any.whl
.
File metadata
- Download URL: super_collator-0.0.5-py3-none-any.whl
- Upload date:
- Size: 19.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 50b476f0c7980078c5bdc78491b3bf0d9573fc24d57fc2b30e309e1435bdc334 |
|
MD5 | 089c12a8887931d66d06990fd723ecb5 |
|
BLAKE2b-256 | 9681ea6abace7b271a04a5c47561f62006cbd2c952f8703e3ade00290bdd9143 |