A simple, deterministic, and extensible approach to inverse text normalization for numbers
Project description
A simple, deterministic, and extensible approach to inverse text normalization (ITN) for numbers.
Overview
This package converts raw spoken-form text (speech recognition output) into user-friendly written-form text. It works best for converting spoken numbers into numerical digits, or other translation tasks that do not modify word ordering. A csv file is provided to define the basic rules for transforming spoken tokens into written tokens, and extra pre/post-processing may be applied for more specific formatting requirements, i.e. dates, measurements, money, etc.
These examples were produced by running this script.
Installation
This package supports Python versions >= 3.7
To install from PyPI:
pip install itnpy2
To install locally:
pip install -e .
Tests
To run tests, use pytest in the root folder of this repository:
pytest
Issues
This package has been verified on a limited set of test-cases. For any translation mistakes, feel free to open a pull request and update failing.csv with the input, expected output, and mistake; thanks!
Citation
If you find this work useful, please consider citing it.
@misc{hsu2022itn,
title = {A simple, deterministic, and extensible approach to inverse text normalization for numbers},
author = {Brandhsu},
howpublished = {https://github.com/barseghyanartur/itnpy},
year = {2022}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file itnpy2-0.0.7.tar.gz
.
File metadata
- Download URL: itnpy2-0.0.7.tar.gz
- Upload date:
- Size: 9.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 67466fe9bd00c9e11ca6250e6f39dc84ee86f2005a2c679251e65e1ee5c5a116 |
|
MD5 | e0b99aae67dbf081ca4f4f6c91ea9ebb |
|
BLAKE2b-256 | 7728e3fccdc8d5747faf82b4d85dbb43472e446eefee98fa574baa0e4a2b94de |
File details
Details for the file itnpy2-0.0.7-py3-none-any.whl
.
File metadata
- Download URL: itnpy2-0.0.7-py3-none-any.whl
- Upload date:
- Size: 6.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a1b8fd82edc98be9ed99e527bbacf72febdd65a81f4a3d926723de02b03b0c0d |
|
MD5 | cd4e9c9879f3c7a045fae5b787568915 |
|
BLAKE2b-256 | 281928e2c85e7f1fcb61c0960cf8a96d2781f484db23be42ae96cd8d2adba187 |