Skip to main content

A simple, deterministic, and extensible approach to inverse text normalization for numbers

Project description

PyPI Version Supported Python versions Build Status Documentation Status MIT

A simple, deterministic, and extensible approach to inverse text normalization (ITN) for numbers.

Overview

This package converts raw spoken-form text (speech recognition output) into user-friendly written-form text. It works best for converting spoken numbers into numerical digits, or other translation tasks that do not modify word ordering. A csv file is provided to define the basic rules for transforming spoken tokens into written tokens, and extra pre/post-processing may be applied for more specific formatting requirements, i.e. dates, measurements, money, etc.


Terminal

These examples were produced by running this script.

Installation

This package supports Python versions >= 3.7

To install from PyPI:

pip install itnpy2

To install locally:

pip install -e .

Tests

To run tests, use pytest in the root folder of this repository:

pytest

Issues

This package has been verified on a limited set of test-cases. For any translation mistakes, feel free to open a pull request and update failing.csv with the input, expected output, and mistake; thanks!

Citation

If you find this work useful, please consider citing it.

@misc{hsu2022itn,
  title        = {A simple, deterministic, and extensible approach to inverse text normalization for numbers},
  author       = {Brandhsu},
  howpublished = {https://github.com/barseghyanartur/itnpy},
  year         = {2022}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

itnpy2-0.0.7.tar.gz (9.2 kB view details)

Uploaded Source

Built Distribution

itnpy2-0.0.7-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file itnpy2-0.0.7.tar.gz.

File metadata

  • Download URL: itnpy2-0.0.7.tar.gz
  • Upload date:
  • Size: 9.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for itnpy2-0.0.7.tar.gz
Algorithm Hash digest
SHA256 67466fe9bd00c9e11ca6250e6f39dc84ee86f2005a2c679251e65e1ee5c5a116
MD5 e0b99aae67dbf081ca4f4f6c91ea9ebb
BLAKE2b-256 7728e3fccdc8d5747faf82b4d85dbb43472e446eefee98fa574baa0e4a2b94de

See more details on using hashes here.

File details

Details for the file itnpy2-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: itnpy2-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 6.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for itnpy2-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 a1b8fd82edc98be9ed99e527bbacf72febdd65a81f4a3d926723de02b03b0c0d
MD5 cd4e9c9879f3c7a045fae5b787568915
BLAKE2b-256 281928e2c85e7f1fcb61c0960cf8a96d2781f484db23be42ae96cd8d2adba187

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page