Collate textual sources with relaxed spelling.
Project description
Collates textual sources with relaxed spelling. Uses Gotoh’s variant of the Needleman-Wunsch sequence alignment algorithm.
$ pip install super-collator
>>> from super_collator.strategy import CommonNgramsStrategy
>>> from super_collator.token import SingleToken
>>> from super_collator.super_collator import align, to_table
>>> a = "Lorem ipsum dollar amat adipiscing elit"
>>> b = "qui dolorem ipsum quia dolor sit amet consectetur adipisci velit"
>>>
>>> a = [SingleToken(s) for s in a.split()]
>>> b = [SingleToken(s) for s in b.split()]
>>>
>>> c, score = align(a, b, CommonNgramsStrategy(2))
>>> print(to_table(c)) # doctest: +NORMALIZE_WHITESPACE
- Lorem ipsum - dollar - amat - adipiscing elit
qui dolorem ipsum quia dolor sit amet consectetur adipisci velit
Documentation: https://cceh.github.io/super-collator/
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
super_collator-0.0.2.tar.gz
(3.0 MB
view details)
Built Distribution
File details
Details for the file super_collator-0.0.2.tar.gz
.
File metadata
- Download URL: super_collator-0.0.2.tar.gz
- Upload date:
- Size: 3.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ebab6189dfc57bb13a73d06e628dce048f3a9213bb0ed7ecc22732655e22510b |
|
MD5 | bc48ba9ad19e105c6196430481b3fe1b |
|
BLAKE2b-256 | 68d18b29dc7cc1189851976b1b78edea980a8691fcb0fb5238752a404ff5cb95 |
File details
Details for the file super_collator-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: super_collator-0.0.2-py3-none-any.whl
- Upload date:
- Size: 20.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8aa8b3507fee47b0ac0c080beff824c0ebe3a50d3219e8759ecc2d32418fc485 |
|
MD5 | bfeb6aaf1a7a2435ad5ef2a5237b3937 |
|
BLAKE2b-256 | e1834d7f9d0ea3e99cdd47203f3d20108435c39bea5c3071089c18f9716768a4 |